EDUCATIONAL AND PSYCHOLOGICAL 
MEASUREMENT 





Volume II OCTOBER, 1942 Number 4 





Two ANNOUNCEMENTS 


APTITUDE TEsTs FoR ARMY WEATHER OBSERVER STUDENTS 
Earle Cleveland, Richard W. Faubion, and 
Thomas W. Harrell 


=e Cptmeum Use or Tist DATA... 2... 6c ccccciascsccdecectes 339 
Maurice Lorr and Ralph K. Meister 


A TECHNIQUE FOR TESTING UNDERSTANDING OF THE VISUAL 


Melvin W. Barnes 


SOME OF THE Less MEASURABLE OUTCOMES OF EDUCATION..... 353 
Edwin J. Brown 


Tue Arms, OBJECTIVES, AND OUTCOMES OF THE OHIO TESTING 
NI 6a ea ck SAS cca ROR AAS ORS TaN OA Neb ee ree Aenean 361 
Ray G. Wood 


EDUCATIONAL REQUIREMENTS AND OCCUPATIONAL LEVELS 
Richard D. Allen and Lester F. Krone 


THE PREDICTION OF SUCCESS OF STUDENT ASSISTANTS IN 
CoLLece Liprary WorK 
Grace M. Oberheim 


THE ADMINISTRATION OF Group TESTS 
Ernest M. Ligon 


THE Purpose, OrIGIN, PLAN OF PROCEDURE, AND VALUES OF 
THE NATION-WIpDE Every Pupit SCHOLARSHIP TESTS 


H. E. Schrammel 


A TEsT For SELECTING AND TRAINING INDUSTRIAL TypPIsTs....409 
Clifford E. Jurgensen 


ARE DAE os ds kia wedecdasanesseseeasecusd 427 


INDEX FOR VOLUME II 











Copyright, 1942, by 
SCIENCE RESEARCH ASSOCIATES 





STATEMENT OF THE OWNERSHIP, MANAGEMENT, CIRCULATION, ETC., REQUIRED BY THE 
ACTS OF CONGRESS OF AUGUST 24, 1912, AND MARCH 3, 1933 

Of EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT, 

Published Quarterly at Chicago, Ill., for October 1, 1942. 


State of Illinois . 
County of Cook § *§ 

Before me, a Notary Public, in and for the State and county aforesaid, personally appeared Walter 
A. Symons, who, having been duly sworn according to law, deposes and says that he is the Business 
Manager of the Educational and Psychological Measurement and that the following is, to the best of 
his knowledge and belief, a true statement of the ownership, management (and if a daily paper, the 
circulation), ete., of the aforesaid publication for the date shown in the above caption, required by the 
Act of August 24, 1912, as amended by the Act of March 3, 1933, embodied in section 537, Postal Laws 
and Regulations, printed on the reverse of this form, to-wit: 


1. That the names and addresses of the publisher, editor, managing editor, and business managers 
are: Publisher, Science Research Associates, 1700 Prairie Avenue, Chicago; Editor, G. Frederic Kuder, 
1700 Prairie Avenue, Chicago; Managing Editor, John R. Yale, 1700 Prairie Avenue, Chicago; Business 
Manager, Walter A. Symons, 1700 Prairie Avenue, Chicago. 


2. That the owner is: (If owned by a corporation, its name and address must be stated and also 
immediately thereunder the names and addresses of stockholders owning or holding one per cent or more 
of total amount of stock. If not owned by a corporation, the names and addresses of the individual 
owners must be given. If owned by a firm, company, or other unincorporated concern, its name and 
address, as well as those of each individual member, must be given.) Ralph A. Bard, 208 §. LaSalle 
St., Chicago, Ill.; Charles S. Boyd, Appleton Coated Paper Co., Appleton, Wis.; R. W. Glasner, 6499 
W. 65th St., Chicago, Ill.; Alfred E. Hamill, 208 S. LaSalle St., Chicago, IIL; Robert C. McNamara, 
623 S. Wabash Ave., Chicago, Ill. ; John I. Shaw, 135 S LaSalle St., Chicago, lll.; Lyle M. Spencer, 
1700 Prairie Ave., Chicago, Ill. Mrs. Dorothy Bard, c/o Roy E. Bard, 134 S LaSalle St., Chicago, 
lll.; Roy E. Bard, 134 8. LaSaile St., Chicago, Ill.; George M. Bard II, c/o Ralph A. Bard, 208 & 
LaSalle St., Chicago, Ill.; Miss Janet Bard, c/o Ralph A. Bard, 208 S. LaSalle St., Chicago, Ill.; 
Robert K. Burns, 1700 Prairie Ave., Chicago, Ill.; Miss Grace M. Wagner, c/o Richard Wagner, 
135 9. LaSalle St., Chicago, Ill.; W. C. Winkel, c/o Modine Mfg. Co., Racine, Wis. 


3. That the known bondholders, mortgagees, and other security holders owning or holding 1 per 
cent or more of total amount of bonds, mortgages, or their securities are: (If there are none, so state.) 
None. 


4. That the two paragraphs next above, giving the names of the owners, stockholders, and security 
holders, if any, contain not only the list of stockholders and security holders as they appear upon the 
books of the company but also, in cases where the stockholder or security holder appears upon the books 
of the company as trustee or in any other fiduciary relation, the name of the person or corporation for 
whom such trustee is acting, is given; also that the said two paragraphs contain statements embracing 
affiant’s full knowledge and belief as to the circumstances and conditions under which stockholders and 
security holders who do not appear upon the books of the company as trustees, hold stock and securities 
in a capacity other than that of a bona fide owner; and this affiant has no reason to believe that any 
other person, association, or corporation has any interest direct or indirect in the said stock, bonds, 
or other securities than as so stated by him. 


5. That the average number of copies of each issue of this publication sold or distributed, 
through the mails or otherwise, to paid subscribers during the twelve months preceding the date shown 
above is (Not a daily publication.) (This information is required from daily publications only.) 

WALTER A. SYMONS, Business Manager. 

Sworn to and subscribed before me this 21st day of October, 1942. 

DOROTHA MOEHLE, Notary Public 


(SEAL) (My commission expires June 12, 1946.) 





ITED STATES OF AMERICA 





PRINTED IN THE UN 

















er 


ne 
he 
v3 


rs 


« 


en 








TWO ANNOUNCEMENTS 


Although Educational and Psychological Measurement is 
several months short of celebrating its second anniversary, the 
growth it has made in its brief life is notable. Now it is pos- 
sible to announce that another step forward has been taken. 
With this issue, Educational and Psychological Measurement 
for the first time goes to the members of the American College 
Personnel Association as their official journal. That this ar- 
rangement will result in a strengthening and broadening of 
the journal goes without saying. 


Educational and Psychological Measurement will continue 
to serve the whole field of measurement as applied in educa- 
tion, industry, and government, and the pages of the journal 
will continue to be open to contributions from the entire field. 
In the past a number of outstanding articles have been con- 
tributed by members of the American College Personnel Asso- 
ciation, although there was no tangible relation between the 
Association and the journal. It is a source of satisfaction that 
beginning with the January, 1943, issue the Association will 
be represented regularly by contributions from its membership 
in accordance with the new arrangement. A section on news of 
the Association will also appear in future issues. 


An announcement to the members 6f the American College 
Personnel Association from its president follows. 


G. Freperic Kuper, Editor 


To the Members of the American College Personnel 


Association: 
It is with a great deal of assurance regarding the future of 


331 











our Association that I announce that an almost unanimous 


ballot approving the Executive Council’s recommendation re- 
garding our affliations with this magazine has been received. 
The tremendous pressures that have been building up on our 
many members with regard to war work have caused your 
Executive Council to spend considerable time thinking of ways 
of fortifying our Association during the war period to the 
end that our corporate existence would continue. With the 
unusually fine co-operation of the editorial board of this maga- 
zine and of Science Research Associates we can be assured 
of more frequent contact with each other. 


As president of your Association I want publicly to 
acknowledge the splendid work done by our Secretary, Dr. 
Feder, who conceived and initiated the plan for affiliation 
with this magazine as our official publication. Now Dr. Feder 
can go to war with the satisfaction of a job well done. 


Unfortunately, the ballot voting was not completed in 
time for materials to be prepared for this issue. In future 
issues, however, articles and news notes will be presented. 
Grace E. Manson, Director of Personnel Research, North- 
western University, Evanston, Illinois, has agreed to act as 
editor for the A. C. P. A. section of the journal. All of us 
owe Dr. Manson a debt of gratitude for undertaking this 
service to our organization. I hope that each of you will feel 
responsible for making suggestions to this editor with regard 
to desirable materials to be included. News notes should also 
be sent to her. Please feel that your Executive Council wishes 
to do everything possible to further the work of our indi- 
vidual members and the continued strength of our Association. 

rea 

Your voting was almost unanimous in favor of a sestrinnt 
meeting in St. Louis in February. It appears that other per- 
sonnel associations will follow our lead. We hope that this 


332 








Wi 
ni! 














will mean that our program, though restricted, may be a sig- 
nificant one. The Proceedings in abbreviated form most likely 
will appear in this journal. 


As each of you finds yourself applying your personnel skills 
to the war work, may I urge you to keep clearly in mind that 
our contributions to higher education are needed more than 
ever these days for two reasons. First, all that college life 
adds to the maturing of students is as necessary in wartime 
as in peacetime. Our contribution through counseling of col- 
lege students is being more clearly seen as a significant part of 
higher education. Second, we need to strengthen our tech- 
niques and our Association in preparation for the after-war 
period when tremendously increased enrollments may present 
the colleges with rich opportunities for affecting in a greatly 
increased manner the welfare of war students. 


Your Executive Council extends to each of you a most 
cordial congratulation on your significant contributions to the 
war and wishes each of you continued effectiveness. 


Cordially yours, 


E. G. WILLIAMSON, President 
American College Personnel Association 








} 
( 
; 
' 














APTITUDE TESTS FOR ARMY WEATHER 
OBSERVER STUDENTS 


EARLE CLEVELAND and CAPTAIN RICHARD W. FAUBION 
Army Air Forces, Technica! Training Command 
and 


THOMAS W. HARRELL 
University of Illinois 


WO groups of weather observer students in the Army 

Air Forces Technical Schools have been studied to find 
those tests which would be most predictive of success in the 
course. 

The first group of students, numbering 116, entered the 
course in August 1940; the second, numbering 73, entered in 
November 1940. These students, like others described in 
previous studies? of selection procedures in the Army Air 
Forces Technical Schools, were selected on the basis of their 
being high-school graduates, with a score on a revised form of 
Alpha equivalent to a percentile rank of 75 and with a mini- 
mally acceptable score on a shop mathematics test. 

The criteria with which the prediction test scores were 
compared are two grades in the weather observer course: the 
first based on a meteorology examination given about three 
weeks after the beginning of the course and the second being 
the final average for the three-month course. The grade on 
the meteorology examination correlated .70 with the final 
course average. The weather observer course covered the fol- 
lowing topics: . 

1. Wind-aloft charts 
2. Atmospheric soundings 





_ JA paper read at the 1941 meeting of the Midwestern Psychological Asso- 
ciation. 

“W. Harrell and R. Faubion. “Selection Tests for Aviation Mechanics,” 
Journal of Consulting Psychology, IV (1940), 104-105. 

W. Harrell and R. Faubion. “Primary Mental Abilities and Aviation 
Maintenance Courses.” <ducational and Psychological Measurement, I (1941), 
59-66. 


335 











EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Upper air observations 
Plotting map signals 
Surface observations 
Weather forms 
Weather instruments 


SIN 1 - 


After the content of the course had been studied, a tenta- 


tive series of tests was chosen. These tests were: 


(1) Mental Alertness. An adaptation of the Henmon-Nelson Test 
for high-school students in which some of the items have been 
changed. 

(2) Scattered X’s. A measure of perceptual speed in which the 
problem is to cross out the x’s placed at random on a page of pied 
type. 

(3) Identical Numbers. A measure of perceptual speed in which 
the problem is to select which numbers in a column are identical 
with the number at the top of the column. 


(4) Algebra. A standard test of algebra. 

(5) Meteorological Achievement. 50 true-false items based upon 
material used in the Weather Observer course. (This test is not to 
be confused with the meteorology examination, which is one of the 
criteria. ) 

(6) Physics Achievement. 144 true-false items based upon mate- 
rial used in the Weather Observer course. 

(7) Surface Development. Six problems, each with six parts, in 
which a picture and a diagram of a simple object are shown, the 
problem being to match corresponding parts of the picture and the 
diagram. 

(8) Flags. 48 items in which the problem consists of deciding 
whether pairs of pictures of flags represent the same or opposite 
faces of the flags. 


(9) Mechanical Movements. 22 problems based on pictures of 
various mechanical movements as, for instance, a question about the 
direction in which oil will be forced, based upon a picture of the 
gears of a rotary oil pump. 

(10) Cubes. 32 problems in each of which the task is to distinguish 
whether or not two drawings represent the same cube turned to 
different positions. 


Tests 2, 3, 7, 8, 9, and 10 are taken from Dr. L. L. Thurs- 
tone’s Primary Mental Abilities study and were used with his 
permission. Test 4 was also used with Dr. Thurstone’s per- 
mission. 


336 











VM 


es 3's. Ur" 





APTITUDE TESTS FOR ARMY WEATHER OBSERVER STUDENTS 


For the students entering in August, the means and the 
standard deviations of each of the ten prediction tests and of 
the two criteria as well as the zero-order correlations between 
the predictor variables and the criteria are shown in Table 1. 


TABLE 1 


MEANS, STANDARD DEVIATIONS, AND CORRELATIONS FOR TEST SCORES 
AND GRADES FOR 116 W. 0. STUDENTS 











Correlation Correlation 





with with 
Standard Meteorology Final Course 
Variables Mean Deviation Exam. Average 

Meteorology Examination 68.5 16.8 —— 70 
Final Course Average 80.6 4.9 .70 — 
Mental Alertness 70.0 10.8 39 39 
Scattered X’s 26.2 9.2 Hy an 
Identical Numbers 50.6 6.3 .28 Pe i. 
Algebra 3.1 Fe 47 41 
Meteorological Achievement 10.8 7.3 40 40 
Physics Achievement 33.9 24.0 0 45 
Surface Development 21.4 6.8 18 11 
Flags 24.5 10.7 16 10 
Mechanical Movements poe | 12.0 41 Re 
Cubes 15.8 6.8 a ir av 





It will be noted that, in spite of the previous selection of the 
subjects by means of a test of mental ability, the mental alert- 
ness test correlated .39 with each of the two criteria. The 
meteorological achievement test, devised by the Classification 
Division, A.A.F.T.T.C., to measure meteorological concepts, 
correlated .40 with the grade on the meteorology examination. 
The physics achievement test and the algebra test correlated 
.55 and .47, respectively, with the same criterion. Other cor- 
relations between test scores and the grade in meteorology 
were positive but not so high. The Surface Development Test, 
which has consistently correlated significantly with grades in 
the basic mechanical course at Air Corps Technical Schools, 
correlated positively but insignificantly with the Weather 
Observer course grades, which is not inconsistent with what 
one would expect. 


337 








EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


A multiple correlation between the grade on the meteor- 
ology examination and the best combination of the tests re- 
sulted in a correlation coefficient of .63. The tests included 
were mental alertness, meteorological achievement, physics 
achievement, and algebra. A combination of the mental alert- 
ness, the meteorological achievement, and the physics achieve- 
ment tests yielded a multiple correlation coefficient of .62. The 
difference between the two coefficients was not enough to war- 
rant the additional testing time necessary for the algebra test. 
The regression equation was 


X, = .29 X, + .38 X, + 30.34, 
where X, =the most probable meteorology examination 
grade, 

X, = the meteorology aptitude test score (meteor- 
ological achievement plus physics achieve- 
ment), 

X, = the mental alertness test score. 


o 


This regression equation, based on the August class, was 
used to predict the results for the class entering in November. 
Of the 73 November students, 10 were eliminated before the 
completion of the course; of these 10, 8 fell below a critical 
level of 55, calculated from the regression equation. Of the 
63 students who completed the course, nine passed who were 
predicted to fail. 


Conclusions: 

(1) Evidence is given from a cross-validation study that 
an examination made up of mental alertness, meteorology, and 
physics questions significantly improves the selection of 
weather observer students in the Army Air Forces Technical 
Schools. 

(2) One of the tests, Surface Development, which has 
been shown to be predictive of basic airplane mechanics grades, 
does not correlate significantly with weather observer grades. 
This suggests the specificity of requirements for the various 
training courses within the Army Air Forces Technical Schools 
and indicates that a single selection procedure is inefficient. 


338 








———— 














ees 








THE OPTIMUM USE OF TEST DATA! 


MAURICE LORR 


and 
RALPH K. MEISTER 


HE procedures conventionally adopted in administering 
g per scoring age scales of the Binet type are often waste- 
ful of time and test materials. For many practical situations, a 
more economical procedure is much in need. It is the purpose 
of this paper to describe a briefer method of administering 
and scoring age scales, to indicate the several advantages of 
the newer method, and to present comparisons of the results 
of this newer method applied to the Revised Stanford Binet 
with results from the conventional form of the scale and from 
the abbreviated scales. 


The rationale for the method to be described is directly 
derived from the fundamental relationships between the field 
of mental test theory and the methods of psychophysics, in 
particular the constant method of psychophysics. Let us first 
briefly review the procedure in determining a sensory threshold 
such as the two-point tactual limen by the constant method. 
An appropriate range of stimuli that are judged neither “two” 
nor “one’’ 100 per cent of the time is selected. Each stimulus 
is then administered to the subject by-means of the aesthesi- 
ometer a large number of times in a prearranged order. The 
subject judges the presence or absence of the desired experi- 
ence, which is “two.’’ The responses are then classified and the 
relative frequencies of judgments of “‘two”’ and “one” for each 
grade of the stimulus scale are determined. The limen, which 


1The authors wish to express their thanks to Dr. Martin L. Reymert, 
Director of the Mooseheart Laboratory for Child Research, for his generous 
permission to use data from its files. 


339 








EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


may be computed by the constant process, represents a transi- 
tion zone between stimuli too weak to arouse a response of 
“two” and stimuli strong enough to elicit a response of “two.” 
Conventionally the stimulus value that elicits a response of 
“two’’ 50 per cent of the time is regarded as the stimulus 
limen. 

Now let us consider in the mental age scale the groups of 
items, supposedly equal in difficulty, that are allocated to each 
year level as representing the typical performance of indi- 
viduals of the corresponding chronological age. These items 
are such that the response is classed as either “correct” or 
“incorrect.”” Each item may be regarded as having a charac- 
teristic response-value that differentiates ‘incorrect’? from 
“correct”’ responses, and thus each item requires for a “‘cor- 
rect’ response a given degree of ability as expressed in terms 
of a certain age group. This response-value corresponds to 
the stimulus value of psychophysical discrimination. Theoreti- 
cally if an individual were presented with items ordered as to 
difficulty and if his responses were made without measuring 
error, correct responses would be made up to a certain point 
on the scale depending upon the individual's ability. ‘“Incor- 
rect’’ responses would be made to all items beyond this point, 
and the scale value of this point—which corresponds to the 
psychophysical limen—would represent a measure of the in- 
dividual’s ability or intelligence. 

Actually of course, as with sensory thresholds, no such 
point exists. In actual practice, instead of this sharp 
theoretical division we obtain mixed successes and failures over 
a number of year levels. It has been pointed out (1) that 
such irregularity of performance or scatter is in part a con- 
sequence of the lack of perfect correlation between items 
resulting from a lack of homogeneity and from the presence of 
error. In the mental test situation, therefore, it is seen that 
the response process may be regarded as a composite consisting 
of a characteristic or “true” component of ability and an error 
component. When the ability component of the individual plus 
the chance error component of his response is greater than the 


340 














ley 


cr‘ 
as: 


ph 
th 
ite 
gr 
ta 
fr 
di 


lin 























THE OPTIMUM USE OF TEST DATA 


level of ability required to pass the item, he answers correctly; 
when this composite is less, he answers incorrectly. The dis- 
crepancy between the actual level of the response and the 
assumed true value, of course, constitutes the error. 


It is evident from these considerations that the psycho- 
physical method of constant stimuli for determining stimulus 
thresholds is applicable to such mental test data. Thus, when 
items are arranged in order of difficulty for standard age 
groups, and the response of any individual to any item can 
take only two values such as “correct”? or “incorrect,” the 
frequency distribution of responses as a function of item 
difficulty may be assumed to be the integral of the normal 
probability curve. The characteristic response-value or test 
limen of that individual (his mental age score equivalent) will 
be that difficulty value expressed in terms of age that yields 
“correct” responses fifty per cent of the time. The individual's 
variability or error will be the standard deviation of the prob- 
ability function described (2). 


Thus, by simply computing a test limen for an individual 
in terms of the age level at which he passes 50 per cent of the 
items, we have an alternative method of determining mental 
age. This procedure is much shorter than that required for 
the full scales and even shorter than that required for the 
abbreviated scales. The test limen or mental age is determined 
either by (a) the single age level at which the individual passes 
50 per cent of the items, or by (b) simply interpolating for 
the 50 per cent point which falls between the age level at 
which he has passed more than 50 per cent and the next higher 
level where he has passed less. Linear interpolation is justi- 
fiable here since in the range concerned the curve of per cent 
passing is practically linear, all of the data will ordinarily be 
employed, and a measure of individual scatter is not desired. 
The limen or mental age score may be computed by linear 
interpolation by the following formula: 


(a, a an) (.50 a P;) 
(Pm — P1) 





’ 


M.A. = a,, + 


341 















EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


where a,, =the age at which more than 50 per cent of the 
items were passed. 
a, = the age at which less than 50 per cent of the items 
were passed. 
P,, = the per cent of correct responses at a,,. 
p, = the per cent of correct responses at a,. 


The procedure of the examination itself is as follows: 
Begin testing at the age level where the child is likely to pass 
half and fail half of the items. On the average this age level 
will be within six months of the child’s chronological age. 
The examiner should, of course, also take into account the 
grade placement, general behavior, and any additional facts 
available concerning the child’s ability. If the child responds 
correctly to 50 per cent of the items at the age level where 
testing is begun, his test limen or mental age score is exactly 
that year level, and the test is completed. Should the child 
respond correctly to more than 50 per cent of the items, the 
items at the next higher (older) level are administered. Test- 
ing is continued until 50 per cent of the items or Jess are 
passed, and in most cases only one additional level is required. 
In only a few cases is more than one additional level of testing 
required. This is what might be expected from an assumption 
of a normal distribution of intelligence in the general popula- 
tion. 

Should the subject respond correctly to Jess than 50 per 
cent of the items at the level where testing is begun, the items 
at the next lower (younger) level are administered and the 
examination is continued until a point is reached where the 
subject passes either 50 per cent or more than 50 per cent of 
the items. It should be noted that one level determines the 
test limen or at most two. If the subject has been tested 
through three or four levels before a limen determination is 
possible, as may sometimes happen, the only data that are to 
be used in determining his mental age score are the level at 
which he passes 50 per cent of the items or the two adjacent 
levels at which he passes more and less than 50 per cent of the 
items, respectively. The rest of the test data is ignored. 


342 














M 


we 


Ite 


at 





le 


iS 


| 


oC = SS OD HD O 


=~ vu 











THE OPTIMUM USE OF TEST DATA 


As Terman has stated, mental ages beyond fifteen are 
artificial and are to be regarded as simply numerical scores. It 
was decided by the present authors to express test limens 
beyond the age of fourteen in terms of these artificial mental 
ages instead of chronological age. The individual’s test limen 
is, therefore, simply the mental age level at which 50 per cent 
of the items are passed. The problem arises as to what mental 
ages should be assigned to the Average Adult and Superior 
Adult I, II, and III levels. At lower age levels, such as four- 
teen, a child passing half of the items at that level is credited 
with a mental age of fourteen by the liminal method. Similarly, 
individuals passing half of the test items at the upper levels 
should be assigned the following mental age scores: 

Average Adult 15 years, 4 months 
Superior Adult I 17 years, 4 months 


Superior Adult II 19 years, 10 months 
Superior Adult III 22 years, 10 months 


MENTAL AGE INTERPOLATION TABLES GIVING THE NUMBER OF MONTHS 
CREDIT TO BE ADDED TO THE LOWER AGE LEVEL TO DETERMINE 
A GIVEN MENTAL AGE FOR A CHILD 
















































































For half-year levels For levels from 
1. from II thru V 2. V thru XIV 
0 1 2 0 1 2 
| | | 
i 3 2124 4 3 a 6 
| | 
Items | : Items 
ay 5 | 2 | 3 | + | ay $ | $ 6 : 
age level J | age level 
6 3 + 5 6 6 7 9 
| | 
Items passed at Items passed at 
3 Average Adult level 4. Superior Adult I 
0 2 3 5 0 1 2 
| 
4 | 4 5 6 | 9 | 5 5 6 10 
Items 7 Items 
- | | 
mee |} 6] 8 | 9 | 22 panes 6| 8 | 10 | 14 
XIV" A | | 
4 ver. 
| Adult » 
6 8 9 11 | 13 7110 13 17 
| 
8 | 12 14 18 

















343 








EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


























Items passed at Items passed at 
5. Superior Adult II 6. Superior Adult III 
0 1 2 0 1 2 
| | | | 
4/ 8 | 10 | 15 4 | 9 | 12 | 18 
Items l Items l 
a 5 | 12 | 15 | 20 — 5 |12 | 18 | 24 
| 
S$. A. I | | Ss. A. II : : 
6 | 15 1g | 23 sas | 22 | a 
| | 




















Tables have been prepared to facilitate the process of 
interpolating for the test limen when it falls between the lower 
(younger) age level at which the child has passed more than 
half of the items, and the next higher (older) age level where 
he has passed less. The number of items passed at the lower 
age level is given at the left and the number at the higher age 
level is given at the top of each table. The body of each table 
gives the number of months of credit (to the nearest whole 
number) to be added to the age corresponding to the lower 
age level. For instance, if a child passes five items at age seven 
and two items at age eight, we enter Table 2 at the left and the 
top, to find that the second row and the third column intersect 
at the value 8. Thus to the lower age level of seven is added 
eight months to yield a mental age of seven years and eight 
months. Table | is to be used for determining limens that fall 
within the age range, II through V, where each half-year is 
regarded as a separate age level. A table for interpolating 
between IV-6 and V is unnecessary since its values are the same 
as those in Table 1. Table 2 is to be used for finding limens 
within the age range, V through XIV. In each instance the 
number of items passed below the limen, i.e., at the lower age 
level, is found at the left of the table. Tables 3, 4, 5, and 6 
are self-explanatory. There will, of course, be a few instances 
in which the liminal method and the process of interpolation 
are impossible, as for example, when four items are passed at 
Superior Adult III level. 

In order to compare the liminally determined mental ages 
with those conventionally computed on the full and abbreviated 


344 


ee 




















o «.8 fm && GH At w~— Mme 8S 4 fC = fH FO fH WD 


a—  2® aie, CU Klar ek lC lh etl CUO lmltéitésC > 


os fees fd fs 








III 





oo (©: Wm OD Vy 


Tc OZ we ww‘ 





ooo 

















THE OPTIMUM USE OF TEST DATA 


scale, one hundred Revised Stanford Binet test folders were 
chosen at random from the younger age group in the current 
files of the Mooseheart Laboratory for Child Research. Tests 
chosen were restricted to those of children from approxi- 
mately seven to eleven in order to avoid limen determinations 
at the adult levels where the changed scoring rationale would 
obscure the basic comparison desired. Successes and failures 
at each age level in the administration of the full scale were 
recorded on cards together with the C.A., M.A., and I1.Q. 
for that test. Then, by a consideration of only those items 
which are part of the abbreviated scales, each test was re- 
scored and a M.A. and an I.Q. calculated for an assumed 
abbreviated scale administration of the same test. Then these 
same tests were rescored a second time and assigned a mental 
age score determined by the test limen as described above, and 
a corresponding I.Q. 

Assuming the full scale as the standard (and disregarding 
the form, L or M), the abbreviated scale M.A. scores and the 
test limen mental age scores were each correlated with the 
M.A.’s of the full scale. The correlation coefficients were .98 
for the abbreviated scales and .91 for the test limen method. 
Then the I.Q.’s corresponding to these ages were correlated 
with the I.Q.’s on the full scale. The coefficient of correlation 
between I.Q.’s on the full and abbreviated scales was .97 and 
that between I.Q.’s on the full and the test limen scales was 
.83. The correspondence of scores may be further judged 
from the fact that the mean absolute discrepancy between the 
ratings on the long form and on the liminal form was a little 
less than 7 points. In 75% of the cases the discrepancy was 
less than 10 points; in 91% less than [5 points; and in 97% 
less than twenty. 

The savings in time of testing may be appreciated from 
the fact that for these one hundred cases the average number 
of levels administered in the full scale was 6.6 while the 
average for the test limen determination was only 2.2 or 
roughly a third. Even when the abbreviated scales are used 
and only four out of six tests at each level are given, the 


345 

















AND PSYCHOLOGICAL MEASUREMENT 








EDUCATIONAL 


number of tests is cut only one third. Thus, the average num- 
ber of tests administered in the abbreviated scales for this 
sample is still twice that required for the test limen determina- 
tion. 

In order to determine how this cutting of the length of the 
test affected its reliability, out of the one hundred tests origi- 
nally selected 31 pairs were chosen which represented succes- 
sive administrations to the same individual child. In this way 
data for reliability calculations upon a test-retest basis were 
secured. It should be noted, however, that between test and 
retest there was an interval of elapsed time of about one year 
and that these reliability coeflicients might be expected to be 
lower than those of the usual test and retest immediately fol- 
lowing because of the changes in the individual and the condi- 
tions of testing and even the change in the content of the test 
itself since items of the same degree of difficulty were not 
administered the second time. The reliability of the I.Q. score 
for the long form of the test for the 31 cases was .86, for the 
abbreviated scale .75, and for the limen form .61. This drop 
in reliability with the use of fewer and fewer items for the 
determination of the score is roughly in accordance with the 
expectations derived from the Spearman-Brown prophecy 
formula. 


The equivalence of the mental age scores secured by the 
two methods of scoring may also be judged upon the basis of 
the statistics presented below. 
















1. Q. M.A. 

Standard Standard 

Form I. Q. Mean Deviation M. A. Mean Deviation 
Full 108.1 16.1 113.53 22.5 
Abbreviated 106.4 15.7 114.92 22.6 
Liminal 108.9 14.0 114.37 19.6 














The T.O. and the M.A. means are practically the same for the 
three sets of scores. The variability of the I.Q.’s and the 
M.A.’s naturally decreases as we pass from the full form to 


346 




















THE OPTIMUM USE OF TEST DATA 


abbreviated and liminal forms because of the reduction in the 
length of the test. 


In the evaluation, on the basis of these data, of the limen 
method of scoring as opposed to the simple addition of number 
right, it should be pointed out that this estimate of its effective- 
ness is specific to the Revised Stanford Binet and is affected by 
the extent to which the Revised Stanford Binet satisfies or 
does not satisfy the conditions for a true difficulty scale. 
Though the test limen method of scoring assumes a series of 
items ordered in difficulty, this condition is only approximately 
met in the Revised Stanford Binet in the sense that though 
items at age ten are invariably more difficult than items at age 
five, yet between adjacent age levels there are inversions, as a 
number of empirical studies have shown. Such a condition 
might have been expected from the limitations imposed upon 
the placement of items. In general they had to be placed 
according to difficulty. In addition, the demands of variety, 
interest, etc., had to be satisfied for each age level. It is quite 
probable that if an age scale homogeneous as to content and 
rigidly ordered as to difficulty were obtainable, the liminal 
method would provide the most reliable measure of an indi- 
vidual’s performance on a limited number of items. This 
would be the case since the individual is scored upon the basis 
of his performance on items that are of 50 per cent difficulty 
for him. 


It will be a matter of concern to some examiners that there 
is no spread of performance to be analyzed. Or perhaps they 
will feel the need of some measure of individual variability. 
However, as it has been pointed out: (1), the practice of 
inspectional analysis of individual successes and failures to 
secure a crude estimate of the individual’s “primary”’ abilities 
is at best questionable. Such scattering of passes and failures 
is based for the most part on factors inherent in the test, in 
test construction, and in systematic errors. Furthermore, 
measures of individual variability on the Revised Stanford 
Binet possess no unique significance for the individual (3). 


347 











EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


The abbreviated method of constant stimuli thus enables 


an examiner to secure mental age scores reasonably equivalent 
to those obtained by the conventional method of scoring of 
the full scale, in half the time ordinarily required. Of course, 
when administration time is ample, the full Stanford Binet 
should be given. However, occasions arise in the clinic and in 
the field when time is at a premium. At such times use of the 
shorter method enables the examiner to administer a test that 
would not otherwise be possible at all. 


1. 


REFERENCES 

Lorr, Maurice and Meister, Ralph K. “The Concept of Scatter in 
the Light of Mental Test Theory,” Educational and Psychological 
Measurement, 1 (1941), 303-310. 

Mosier, Charles I. “Psychophysics and Mental Test Theory: Fund- 
amental Postulates and Elementary Theorems,” Psychological Re- 
view, XLVII (1940), 355-366. 

McNemar, Quin. The Revision of the Stanford Binet Scale (With 


an introductory chapter by L. M. Terman). New York: Houghton 
Mifflin, 1942, 185. 


348 








y 





rT OO mw et 





A TECHNIQUE FOR TESTING UNDERSTANDING 
OF THE VISUAL ARTS 


MELVIN W. BARNES 


University of Illinois 


N 1941-1942, the University of Illinois offered a survey 
course in Literature and Fine Arts. This course, open to 
sophomores, is one of seven full-year courses which now com- 
prise the lower-college program of the General Division of 
the College of Liberal Arts and Sciences. One unit of the 
course in Literature and Fine Arts is devoted to the study of 
painting. In an attempt to appraise achievement in this phase 
of the year’s work a testing technique was evolved which it is 
the purpose of this note to describe. 


In the conduct of the course the students were brought into 
contact with a wide variety of paintings by means of lantern 
slides. These works of art were studied in terms of a four- 
fold scheme of analysis: color, composition, expression, and 
function. In some instances, a work was studied first in black 
and white for the purpose of emphasizing composition before 
it was considered in color. On a number of occasions two or 
more representations of a single theme or incident by different 
artists were studied comparatively. Since the basic aim of the 
course was to cultivate understanding and thereby—it was 
hoped—appreciation, little time was spent on artists, history, 
or techniques of painting. In addition to the reproductions 
used in the classroom, materials owned by the Department of 
Art and others in the University museums were made available 
to the students. The course was rounded out with a day at the 
Chicago Art Institute. 





When the problem of testing achievement in this course 
arose, the following procedure was devised. So far as the 
writer knows, the device is unique. 


349 








EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


The technique employed two projecting lanterns to throw 
simultaneously two paintings in color on adjacent screens 
placed in the front of the classroom. By this method colored 
reproductions of a size approximately three feet by five were 
placed side by side in a way which permitted every member 
of the class to see them clearly. The paintings had not been 
seen by the class before the time of the test. The test, which 
was mimeographed, was based upon points of similarity and 
contrast between the paintings thus reproduced. Before the 
showing of the paintings each student was given a copy of the 
test and time was allowed for reading the directions. By the 
adjustment of Venetian blinds the room was darkened enough 
to permit good vision of the projected pictures, while enough 
light was admitted for reading and writing. Of those taking 
the test the only writing required was the indication of re- 
sponses by a letter written in a blank space. 


The pair of paintings was selected chiefly for their 
numerous points of contrast. One painting was a Rubens, the 
other a modern work by William Gropper. The test items 
were organized in accord with the scheme of analysis and 
synthesis which had been followed in class. The first set of 
items dealt with color, the second set with composition, and 
so on. A variety of the conventional multiple-choice items was 
used. The following is an example: 


In Painting A (the Rubens was designated A) more use is 


a. linear contrasts 


made of b. straight lines than in Painting B. 
c. sharp angles 


d. rhythmic curves 


The test is in process of analysis, the results of which will be 
used in revising and lengthening it. 


This technique obviously does not require any particular 
type of test item but is adaptable to all of the conventionally 
used forms involving comparison and contrast. Reproductions 
of works of sculpture and architecture could, of course, be 
utilized as well as those of painting. Since this method affords 


350 




















OW 
ens 
red 
ere 


Sir 
he 
ns 
id 
of 
id 


as 











TECHNIQUE FOR TESTING UNDERSTANDING OF VISUAL ARTS 


a means of providing colored reproductions which are pre- 
cisely relevant to the aims, content, and method of a course, 
it appears to have possibilities for classroom use which the 
standard tests on the market do not possess. This type of test, 
moreover, is decidedly inexpensive, whereas the cost of the 
better standard tests in art places serious restrictions upon 
their use. 


351 
































SOME OF THE LESS MEASURABLE OUTCOMES 
OF EDUCATION* 


EDWIN J. BROWN 


Kansas State Teachers College 


NEED not say that I appear before this group with con- 
I siderable apprehension and not a little hesitation. Frankly, 
it is not the group which is causing my trepidation, but my 
subject. 

When one speaks of outcomes in education he is talking 
about the very essence of it all. He is talking about our end- 
product, the thing for which we throw in the current, gear up 
the machinery, put in the man-hours, spend the money; the 
thing which we get after we’ve done the work. Outcomes in 
education are for us what the finished interceptor, the eight- 
gun pursuit plane, the one-ton bomb, the 155-millimeter field 
piece, the thoroughly trained airman is to our war program. 
Thus my hesitation in discussing outcomes in education at all, 
even those which we are agreed are more or less measurable. 
To discuss the more measurable outcomes before this group 
would take some boldness and one should attempt it with 
much hesitation, but to discuss the /ess measurable outcomes is 
about twice as risky. My only comfort is that no one knows 
much more about it than does his neighbor. And I am not 
supposed to tell how to measure them. |. 


First of all, what are some of the outcomes (may I assume 
there are such) which we want to get—outcomes which are 
dificult of measurement ? 





May I say that after spending some fifteen years rather 
directly in the field of measurement, I am not nearly so certain 


*Paper read at the meeting of the National Association of Teachers of 
Educational Measurement, San Francisco, February 24, 1942. 


353 











EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


of the efficiency of the work as I once was. Heresy? Right— 
but you can throw me out of the organization later. In general, 
I’ve about come to the conclusion (there are exceptions, of 
course) that the ease and accuracy with which any educational 
outcome is measured is in direct proportion to its unimpor- 
tance. That is, the easy items to measure accurately are the 
ones which make the least difference whether they are meas- 
ured or not. I agree that there are notable exceptions to my 
generalization. In general, though, you agree with me, don’t 
you, that the more important things of life tend to be beneath 
the surface, too deep to be picked up readily on the hooks of a 
question, and that measurement is usually involved in question- 
ing, either direct or indirect. 


Does this mean we should not try to measure these things? 
I'd say, “Certainly not.’’ Let’s work on the thing rather than 
say that it is one of the unmeasurables and that we can’t do 
anything about it. 


First of all, I’d like to start with the thought that the 
more difficult of measurement outcomes fall into general 
classes. (There may be three, six, or nine.) Let us call two 
of these difficult-to-measure groups, for want of a better term 
at this time, Outcomes in Attitudes and Outcomes in A pprecia- 
tions. I can measure fairly accurately some outcomes in 
arithmetic skills, in spelling accuracy, in verb usage, but I seem 
to have much difficulty in measuring the same youngsters in 
their attitudes toward arithmetic, toward spelling, toward 
grammar. I find myself relying on clues which are not too 
clearly defined in my own mind, when I try to measure their 
attitudes. Can these clues, then, be developed, expanded? Are 
the big things in life, after all, caught rather than taught? I 
sometimes get confused, and when so, am inclined to say yes. 
If we go into attitudes we can break them down into any 
number of divisions. There are attitudes toward school, 
toward home, toward boys, toward girls, toward law and 
order, and so on, exhaustively. However, these can be grouped 
into two big categories from which a further breakdown might 


354 
































SOME OF THE LESS MEASURABLE OUTCOMES OF EDUCATION 


come. I refer to attitudes and traits which are primarily con- 
cerned with personal growth and attitudes and traits primarily 
concerned with our relationship with others. Each general 
item is susceptible to a further breakdown, of course. This I 
shall suggest later. 


Isn’t one of the wrong assumptions we make when we 
speak of the more unmeasurable outcomes of education that 
we are inclined to fail to do what we always do in working 
with the more measurable items, viz., break them down into 
some of their component parts? We take arithmetic computa- 
tion and break it down into the four fundamental operations 
of adding, subtracting, multiplying, and dividing. Then we 
take addition and break it down even further, trying to find 
not only the weakness in addition but the cause of that weak- 
ness. Might we not, if we tried seriously, break down atti- 
tudes concerned primarily with our own personal growth, into 
smaller units, breaking these in turn into still smaller ones 
until we might secure parts small enough to be measured? 
Of course, we would not be sure that an old axiom would not 
be ruined and we'd find after we did our measuring that the 
whole is not equal to the sum of the parts. 


~~ Suppose we take the topic of attitudes in personal develop- 
ment. What are some of the things which might be considered 
from the viewpoint of a high-school boy or girl? Of course 
no one can name all of the desirable attitudes which are worth 


considering, but let’s begin: 


1. An attitude of open-mindedness. We might interpret 
this as the Evaluative Criteria for the Cooperative 
Study for Secondary School Standards does: A willing- 
ness to revise opinions and conclusions in the light of 
new evidence. 

2. An attitude of critical-mindedness. Disposition to seek 
causes or explanation, to weigh evidence with care, and 
to withhold judgments until sufficient evidence is in. 

3. An attitude of concentration. Ability to give attention 
through a considerable period of time in spite of dif- 
ficulties or distractions. 


355 














EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 





4. An attitude of industriousness. Disposition to use time 
and ability effectively and constructively. 

5. An attitude of responsibility. Willingness to acknowl- 
edge responsibility for one’s acts and obligations. 

6. An attitude of self-reliance. Willingness to make de- 

cisions and carry out plans oneself instead of depending 

on others or the school. 

An attitude toward se/f-control. Ability to avoid dis- 

play of temper or other uncontrolled emotion. 

8. An attitude of creativeness. Desire to do or say things 
in a new or better way. 

9. An attitude of enthusiasm. Readiness to enjoy life and 
participate in its wholesome activities. 


~I 


These we will all grant have something to do with personal 
development. 

There can be little doubt but that some of these items are 
more susceptible to objective measurement than are others. 
Again, there is little doubt but that each of these is more sus- 
ceptible to measurement than is the general outcome used for 
illustrative purposes from which they came, the outcome of 
personal development. 

My suggestion is now that each of the items be considered 
in turn for a further breakdown. This, of course, would 
entail the development of a valid definition, which probably 
would go back to the common consent, massed judgments 
technique. 

In the field of social relationships we have another of the 
more difficult to measure outcomes of education. No one, of 
course, argues that the outcome is insignificant because it is 
hard to measure, or that it is found completely embodied in 
other outcomes which are easier to measure. Shall we then 
pass it up entirely? Let’s see what we might do to it, again 
falling back on the Cooperative Study material for sugges- 
tions. We suggested that open-mindedness, critical-mindedness, 
concentration, industriousness, responsibility, self-reliance, self- 
control, creativeness, and enthusiasm are desirable, but difficult 
to measure, outcomes of personal development. Now what are 


356 








SO 
re 


r 
é 
I 


As 














SOME OF THE LESS MEASURABLE OUTCOMES OF EDUCATION 


some desirable outcomes, broken down just once, of social 
relationships? Suppose we say: 
1. Social-mindedness. Willingness to subordinate personal 
advantage to the common good. 
2. Co-operation. Desire to work agreebly with others. 
3. Tolerance. Good will toward groups or individuals of 
different race, customs, or opinions. 
4. Courtesy. Consideration of others. 
Generosity. Willingness to share opportunities or priv- 
ileges. 
6. Honesty. Integrity in handling money, straightfor- 
wardness, sincerity in personal relationships. 
7. Dependability. The extent to which one fulfills prom- 
ises, discharges obligations, finishes tasks. 
8. Loyalty. Devotion to interests of friends, school, 
home, country. 
9. Fair play. Unwillingness to take advantage of others 
or another. 


wa 


Our difficulty, of course, is to get measurements of atti- 
tudes toward, not just of information about. One can com- 
paratively easily measure information about, but not the 
attitude toward. 


One could go on and build these up. The point I would 
make is that while these outcomes are difficult of measurement, 
each breakdown tends to become more objective or perhaps 
better, Jess subjective. If in turn one were, for instance, to 
analyze honesty for a high-school pupil, it might be found that 
a fairly valid test could be set up. I’m inclined to guess that if 
the validity could be assured, reliability would follow fairly 
readily; that is, a test could be made which would agree with 
itself. 

Appreciations is the generic name we give to another group 
of outcomes which are deliberately sought. We have, however, 
not gone so far as we might in developing measures, largely, 
I suppose again, because of the feeling of intangibleness. 
Undoubtedly, these are even more difficult of measurement, as 
the emotional factor enters in. However, again, it is not out 


357 











EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


of place to say that appreciation of beauty in nature, or better, 
in art, is not so difficult to measure as appreciation in general. 
Appreciation of commendable conduct and qualities in others 
might be measured—indifferently well perhaps—but still 
measured. 

Appreciation of home and family would seem susceptible 
of some measurement, and so on for other items such as ap- 
preciation of good workmanship, appreciation of spiritual and 
religious values, appreciation of law and constituted authority, 
and others. 

It seems to me that a prime reason for not doing more, at 
least in the attempt to measure certain educational outcomes, 
lies in our unwillingness to attempt a further breakdown which 
is always a step in measurement. To illustrate: Under out- 
comes of social relationships which are surely end-products of 
education, I listed among others, courtesy and fair play. Let’s 
see what a further breakdown would do. We defined courtesy 
simply for the sake of mutual agreement as consideration for 
others. Let’s start a further analysis, considering the subject 
from the viewpoint of a junior in high school, somewhat as 
follows: Do I always wait my turn; do I refrain from loud 
talking and laughing when it disturbs others? Do I refrain 
from interrupting others when they are talking? Do I offer 
to share what I have with others? And so on. Isn’t it possible 
that this rather intangible outcome of education can be meas- 
ured, and much more reliably than we are inclined to think? 


Fair play, which is one step down from the general out- 
come, social relationship, might in like manner be susceptible 
to analysis. Fair play; unfair advantage; cheerful loser; 
modest winner; recognition, appreciation, and commendation 
of skill in others; consideration of the sensibilities of others; 
abiding by decisions without question either by word or act; 
loyalty to personal and team ideals; observation of traiuing 
rules in athletics; winning without boasting, losing without 
whining; willingness to sacrifice for a group good, and so on. 
Each item in turn might be analyzed still further. We make 


358 








wn Hy fr. & Oo & 


ao of fF A 8 ff. 


at 
S, 
ch 
it- 
of 


sy 





SOME OF THE LESS MEASURABLE OUTCOMES OF EDUCATION 


the mistake of trying to measure an important item like this 
with a short test. It requires a test as well developed as a 
Binet revision. 

Perhaps, in conclusion, it should be said we must not be 
too much perturbed at first about the measurement of these 
outcomes but should turn our attention first to securing these 
outcomes with a greater degree of certainty. It would be 
desirable indeed to be able to measure results in a citizenship 
class—if we are sure that we are teaching citizenship. Patriot- 
ism is dear to every American’s heart, but who knows for dead 
sure what it is—or how to teach it? 

I would indeed be disconsolate did I not believe we are 
doing better work both in teaching these intangible outcomes 
as well as in measuring them than we think we are. The clues 
we get are probably fairly good. This is, of course, wishful 
thinking. Someone has said, ”’ ’Tis better to travel hopefully 
than to arrive,’ and Browning puts it, ““A man’s reach must 
exceed his grasp or what’s a heaven for?” 


359 

















ai 


W 


tl 


al 
ne 


Ed 





THE AIMS, OBJECTIVES, AND OUTCOMES OF THE 
OHIO TESTING PROGRAM* 


RAY G. WOOD 


Ohio State Department of Education 


ODAY the world is moving along at an increased tempo 

and at a higher degree of efficiency. Education, too, must 
swing into step, and it is doing so. Education, it is agreed 
today, is for the whole of life; it is concerned with the de- 
velopment of the “whole nature of every pupil,” with the in- 
tegration of his total personality for the good of himself and 
of society. There is disagreement, however, as to just how 
much time and attention the school should give to the develop- 
ment of the social, moral, and physical phases of the child, 
and how much time and attention to the development of his 
intellectual side. 

Some progressive extremists would sacrifice the mental 
development of the student to the personalizing of his char- 
acter, or make it a complementary factor only; traditional 
extremists would reverse the procedure. Neither policy is 
wholly applicable to our present educational situation or meets 
the needs of the typical American school. 

What is needed now, more than ever, is that from our 
theorizing and experimentation some tangible and definitely 
constructive guiding principles, practicable in all our public 
schools, be evolved for the development of integrated social- 
intellectual personality. What these shall be is still a question, 
but I believe with many that we should train the mind to the 
maximum of possibility, taking into account the limitations of 
all methods, salvaging the best that is in them, and inventing 
new ones that are more generally successful. 


*Paper read at the meeting of the National Association of Teachers of 
Educational Measurements, San Francisco, February 24, 1942. 


361 














EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

In doing this we shall arrive at new methods, new objec- 
tives, new subjects, and new curricula. But to make possible 
this more comprehensive program, with its broad conception of 
the schools’ activities, time (so essential to everything now) 
must be saved. Overlapping, useless, and minor material will 
have to be discarded to make way for the many newer and 
more important elements that are to be added; that is, a better, 
clearer understanding of the tangibles needed will leave more 
time for the needed intangibles to be developed. 

And it is for this saving of time, it is for the determination 
of achievement in the needed tangibles, that a good testing 
program is requisite. 

We in the testing field in Ohio are justly proud that our 
work is helping to do this, that it is in line with the modern 
philosophy of education, in fact, that it is giving direction to 
it in a concrete and personal way. By providing the great 
majority of teachers in Ohio a scientific means of analyzing 
and evaluating their product, we are saving them time to 
achieve more, and more efficiently. 

Our program, which has been carried on since 1929 as a 
division of the State Department of Education, is in its several 
phases unique to Ohio. Its chief objective is the motivation of 
scholarship—it stimulates the educational units to put forth 
more effort and seeks to increase the efficiency of that effort. 
There is no compulsion whatsoever about participation in our 
program, and there is no attempt at standardization. These 
are important features, we believe, contributing to its effective- 
ness and its popularity. Besides, it is distinctly a product of 
and for the Ohio schools, because the tests are built by Ohio 
teachers for Ohio children. They are designed not to de- 
termine the success or failure of the individual but to help 
teachers to adjust their teaching to the needs of their children 
and the children to adjust themselves to the work of the 
particular class or subject. Because of this, students in the 
state no longer approach tests with fear and trembling, in such 
a disturbed state emotionally and mentally that the results are 
impaired, but they anticipate the testing with a spirit of sports- 


362 














AIMS, OBJECTIVES, OUTCOMES OF THE OHIO TESTING PROGRAM 


manship and with a realization that whatever the results may 
be, they will aid, not hinder, them. 

Since the beginning of our testing program, the democratic 
philosophy of education has been the guiding principle upon 
which the work of the Ohio Scholarship Tests division is 
based. The tests of the program are revised annually, and 
thus provision is made for meeting the changing curricula, 
textbooks, and methods of educational practice. Furthermore, 
there is no compulsion to use any of the tests administered. 
Schools, private as well as public, are free to use any phases 
of the program for such purposes as they wish. 

Stated concisely, the objectives of the several phases of 
our program are: 


1. To provide materials for the improvement of instruc- 
tion. 


2. To provide a continuous program of new and improved 
tests. 


3. To provide for the motivation of students toward 
greater accomplishments in their classroom activities. 


4. To provide pertinent instructional research data. 
5. To provide curriculum guides. 


These objectives are achieved variously by the six distinct 
phases of our testing program, which are: The Every Pupil 
Tests, the Eighth Year Test, the General Scholarship Test for 
High School Seniors, the District-State Scholarship Team 
Test, the Senior Survey Tests, and the Bulletins of Research 
and the Curriculum Guides. 

An idea of the popularity of the pregram may be gained 
from these figures: a total of 1,200,772 tests were adminis- 
tered in this program last year. In this number was repre- 
sented every county in the state and over a thousand large 
and small city schools. Of this number, 41,269 were eighth- 
graders; 1,146,672 were grade-school and high-school pupils 
who took part in the Every Pupil Tests; 5,305 were high- 
school seniors in the upper third of their classes; and 7,526 
were high-school students in grades 9 to 12, who were selected 


363 

















AND PSYCHOLOGICAL MEASUREMENT 





EDUCATIONAL 


to take part in the annual spring academic and commercial 
contests. 

This popularity is due to the fact that the school men of 
the state look upon the testing program as one of the most 
vital and beneficial functions in the state educational set-up. 
They recognize the testing program as one of their own 
supervisory tools, as an active force growing out of and along 
with the actual conditions in their schools, not as a measuring 
stick imposed from without. 

A brief discussion of the several phases of this program 
will give a clearer picture of the program. 

The Every Pupil Tests, because they affect the greatest 
number of children, may be considered the most important 
phase of our program. There are two series of these tests: 
the First Every Pupil Test, administered in December for 
general diagnostic purposes; and the Second Every Pupil Test, 
administered in March for an achievement measurement and 
as a check upon the effectiveness of the remedial teaching which 
has been carried out on the basis of the results of the First 
Every Pupil Tests. 

In these series are included tests for all the subjects that 
are most commonly taught in Grades 3 through 12. For 
example, we have tests in English for Grades 3 through 12; 
Reading for Grades 2 through 12; Mathematics for Grades 3 
through 10; and in the other subjects such as Latin, French, 
Geography, Social Studies, Chemistry, Physics, General Sci- 
ence, Biology, Health Education, and Hygiene. New forms 
of each test are developed for each administration, and new 
tests are added from time to time, such as Attitudes and Skills 
in the Use of References, Conservation, and Scientific 
Thinking. 

These tests are of the achievement type and are so con- 
structed that they give good general diagnoses of both the 
individual and the class abilities and deficiencies in the prepara- 
tion in the specific subject. For an analysis of the more 
puzzling and particular individual difficulties, the teacher must 
use specifically diagnostic and functional tests; but, for the 


364 





a ee 


= 





it 


Lew | 


oe 











AIMS, OBJECTIVES, OUTCOMES OF THE OHIO TESTING PROGRAM 


general group and the average pupil, these tests have proved 
very satisfactory in the thirteen years they have been 
administered. 

As was mentioned before, these tests have their origin in 
the Ohio classrooms. They are constructed by Ohio teachers 
of recognized ability, working in committees or individually; 
they embody suggestions sent in by teachers and administrators 
throughout the state; and they are validated against research 
studies, committee reports, the most used textbooks, and the 
Ohio Curriculum Guides. In this way we are sure that the 
tests are really measuring what is or should be taught, and 
that they are of service to the teachers and students in 
emphasizing and vitalizing the important content. 

Each subject-test takes one forty-five minute period and 
may be administered either in the individual classroom or as 
best suits the purposes of the school. All tests are scored by 
the classroom teacher. Forms are furnished with each order 
for recording for state use the general distribution of the 
scores of a class and for making item reports. These reports 
are, of course, kept in strictest confidence. As soon as they 
are received by the state office, state percentile and item norms 
are compiled from the data. These are then printed and mailed 
to the participating schools. 

By analyzing and interpreting the results of the work of 
her own class in the light of these norms, the teacher is able 
to determine wherein there are deficiencies and can set about 
to determine the particular causes of the weaknesses. Similar 
analyses and interpretations may be made for the individual 
pupil—in fact in not a few instances the students themselves 
analyze and interpret their own results. Thus, comparatively 
early in the year both the teacher and the student have an 
estimate of the equipment of the student for the course and 
indications of probable areas of difficulty, and together they 
can set about to discover the causes of these difficulties. When 
the causes are determined, then a definite remedial program 
may be settled upon. 

After the Second Every Pupil Test has been administered 


365 














EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


and the analyses and interpretations have been made, a com- 
parison with the results of the First Every Pupil Test reveals 
whether progress has been made and whether the remedial 
procedures have functioned effectively. Through these com- 
parisons every pupil has a diagnosis of his achievement and 
has evidence of the phases of the subject he has mastered and 
the phases upon which he must still concentrate. This is the 
factor that is emphasized continually—the improvement of 
the original product. It has done much to lessen, if not to 
erase completely, the fears of teachers that these tests are a 
means of measuring teacher efficiency. Teachers have come to 
realize that high scores or low scores on the tests are not the 
important factors, that the really important factor is the 
improvement of learning as indicated by the raising of the 
scores of the individual pupils and of the class between the 
first and second series. They know that a low-ranking class 
that shows progress is a better evidence of good teaching than 
a high-ranking class that remains at the same level from test 
to test. 

The Eighth Year Test and the General Scholarship Test 
for High School Seniors are two of the other important parts 
of the Ohio Scholarship Testing program. These tests are 
measures of the cumulative achievement of these respective 
groups and are administered in the spring of each year. The 
tests are designed to measure not only factual knowledge as 
such but also the ability to use this knowledge in functional 
situations, and to stimulate the desire for the acquisition of 
such knowledge and ability. The Eighth Year Test is a two- 
hour test in the four fields of English, mathematics, history, 
and science, and may be taken by any eighth-grader. The 
Senior Test consists of rather general tests in the following 
five areas—English, mathematics, science, social science, and 
reading and functional language, with 30 minutes allowed for 
each test. Because one of its chief purposes is the selection of 
outstanding students, only the upper third of the graduating 
seniors are admitted to this examination in Ohio. However, it 
is administered to al] high-school graduates in the state of 


366 





al 


£ 


—_— - FF 


ht 


nm 


ww oeweFev eo oh oe FT 


Trlr we «¥ 











AIMS, OBJECTIVES, OUTCOMES OF THE OHIO TESTING PROGRAM 


New Mexico, where it is administered under the direction of 
the University of New Mexico at Albuquerque. Other colleges 
and universities outside Ohio likewise use this test in their 
freshman placement programs. 


High general scholarship is evidenced not merely by an 
acquaintance with the basic principles in the general fields of 
learning but likewise by the ability to apply this knowledge to 
life situations; it calls for a broad as well as a thorough educa- 
tional preparation. The objective in these two tests of our 
program is the stimulation of this high general scholarship, 
and it is evident that they are serving as real stimuli in this 
respect. Follow-up studies have proved that the specific in- 
centives of these tests have resulted in an increased interest of 
students in their achievement, have encouraged many to 
broaden their educational preparation through more wide and 
general reading, and have led many high-school students to 
choose more widely from the courses offered in their program 
of studies. Hundreds of scholarships are awarded annually 
by many colleges and universities in and outside of Ohio to 
seniors ranking high in this test. Follow-up studies have 
shown that these students do considerably better than average 
work in the institutions at which they matriculate; and research 
has shown that these tests are predictive for the group as a 
whole of probable success in continued education. 


The fourth part of our testing program—the District- 
State Academic and Commercial Scholarship Tests are admin- 
istered in May of each year at the five state universities (in 
Ohio well located geographically for.this purpose). The 
District-State Academic Test has become the “scholastic 
event” of the year in Ohio; it fosters interest in academic 
achievement in a manner akin to that in which athletic events 
stimulate athletic prowess. The students enter with a great 
deal of enthusiasm and interest, and they work hard in order 
to make the scholarship team and participate in this “‘academic 
field day.” Last year 7,526 students in a total of 237 teams 
participated. Schools send teams of 32 students, or fewer, to 


367 








EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 





their closest university center to compete with other teams and 
students in that district for academic honors. Each team is 
limited to two entrants in each of sixteen subjects, which include 
English, mathematics, history, the sciences, Latin, and modern 
languages. The schools are classified according to enrollment, 
except that schools of a county system combine to send one 
team. Points are given for the first twenty places in each of 
the subjects, and the teams are ranked superior, excellent, or 
honorable mention, according to the number of points they 
accumulate. District awards are made to individuals and to 
teams according to the classifications of the schools. Then all 
papers are sent to our state office, where similar awards are 
computed for the all-state winners. 


The purpose behind this part of the program is again a 
motivation of scholarship, especially by the granting of recog- 
nition to students of outstanding achievement. It should be 
noted that it is not only the thirty-two pupils on the team who 
receive this motivation, but also the hundreds of others who 
try to make the team. The truly professional and able admin- 
istrator makes the most of this opportunity and encourages 

_every student to strive to win a place on the scholarship team, 

by not announcing the appointees until just before the meet. 
We have athletic contests, music contests, and forensic con- 
tests; why shouldn’t we have academic contests? Why 
shouldn’t we popularize the “brains” as well as the “brawn” 
of our schools? Ohio schools have recognized the values of 
this academic competition, and school people would not be 
happy if it were to be discontinued. 

The fifth phase of our program—Senior Survey Tests and 
remedial materials—are comprehensive diagnostic tests de- 
signed for locating deficiencies in the fundamentals in English 
usage, reading, and mathematics, and are administered the first 
week of either semester. Along with the tests are provided 
manuals and workbooks for the remedial work, which are so 
organized that the instruction may be carried on individually 
or in groups, with a minimum of direction on the part of the 


368 




















AIMS, OBJECTIVES, OUTCOMES OF THE OHIO TESTING PROGRAM 


teachers. This part of the program had its origin in the plea 
of college leaders who found many high-school graduates 
entering their institutions poorly equipped in these fundamen- 
tals so necessary to the carrying on of successful work. If 
high-school students who enter college are in need of remedial 
work in these areas, many who do not go on to college prob- 
ably also need to have their ability in these fundamentals 
improved. Recognizing the urgency of this need, the Ohio 
State Department of Education is granting one-half unit of 
high-school credit to each senior who shows proficiency in 
overcoming the weaknesses clearly indicated by his results on 
Form A of these tests. Many colleges and universities 
throughout the nation have recognized the merit of these 
materials and are using them as a part of their freshman 
program of testing. 

So much for the tests themselves and the valuable services 
they render to teacher and pupil alike, when properly admin- 
istered, analyzed, and interpreted. Let me suggest briefly some 
of the concomitant values that are to be derived from such a 
co-operative testing program. 

The first is the great number of research studies that the 
results of these tests give rise to and that are of particular 
value to the teachers of the system because the bases lie in the 
local situation. In Ohio many such studies have been com- 
pleted, and teaching procedures have been influenced to the 
end that the indicated weaknesses are being remedied. Very 
complete studies have been made in English, mathematics, 
and the social sciences, and less comprehensive ones in other 
subjects. Iwo research reports, R; and_R,, have recently been 
issued on the Every Pupil Tests—these give superintendents 
and teachers techniques for determining growth learning 
curves on the basis of their own class scores, and of interpret- 
ing their significance to the individual pupil and teacher. 

A second very important concomitant value is the aroused 
interest of teachers in the improvement of their classroom 
product and in an understanding of the scientific diagnosis and 
measurement of their teaching procedures. This has been 


369 

















EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 





evidenced by the vitalizing of curricula materials and by the 
functionalizing of the learning process. Most of the tests for 
the entire program are built by individual classroom teachers 
whose work has been recognized as outstanding or by com- 
mittees of teachers in the field of the subject. Teachers and 
administrators alike have expressed amazement at the broader 
understanding of their task and the other worth-while results 
that have come to them from their participation in these test- 
building projects. Not only have teachers taken an active part 
in the construction of the tests, but they have also been active 
in the writing of Curriculum Guides, which suggest materials 
to be taught and abilities to be developed in the various fields, 
recommend methods of procedure, and provide a working 
bibliography. These Guides are not attempts to dictate order 
in the presentation of material or methods of procedure nor 
are they attempts to supplant the local course of study; they 
are designed to assist teachers and local curriculum committees 
in the re-examination, re-evaluation, and re-formation of 
their curricula in the light of present trends in educational 
philosophy. 

A third and most important concomitant value is the stimu- 
lation and motivation of the thousands of students who 
annually come in contact with this program. The Every Pupil 
Tests help them to help themselves; they have specific and 
objective evidence of their achievements and of their lack, of 
their abilities and of their deficiencies, and go about their 
remedial classwork with understanding and with determination 
to improve. The other tests create an active interest in not 
just good scholarship, but in excellence of achievement. 

Time does not permit the listing of more of these values 
nor the further elaboration of those already mentioned—each 
in itself would furnish material for another paper. However, 
these suggestions and this brief survey of our Ohio Scholarship 
Testing Program have, I hope, presented the possibilities of a 
a varied, comprehensive, and co-operative state testing pro- 
gram, and demonstrated that such a program is of real service 
to its participants—teachers and pupils alike. 


370 























EDUCATIONAL REQUIREMENTS AND 
OCCUPATIONAL LEVELS 


RICHARD D. ALLEN and LESTER F. KRONE 
Department of Public Schools, Providence, Rhode Island 


N every job description and worker description there 

appears the category “Educational requirements.”’ These 
educational requirements for almost any kind of work are 
somewhat elastic. In good times when labor is scarce, stand- 
ards always move downward; while in periods of unemploy- 
ment, standards are automatically raised. This is true with 
the standards of college and professional schools as well as 
with the standards of apprenticeship and of employment in 
the less skilled classifications. In fact, most employers and 
personnel workers regard educational requirements as merely 
a convenient and economical screen with which to eliminate the 
less desirable applicants. 


This situation is interesting in view of changes in educa- 
tional practice during recent years. Formerly most school 
adjustments were made in terms of grading; that is, the slower 
pupils were kept back grade after grade until there was a 
high percentage of over-ageness for the grade throughout the 
school system, at least until the legal age of school-leaving 
began to operate. Under such conditions the “last grade 
attended” really had a definite meaning in terms of school 
achievement and educational qualifications. In recent years, 
however, there has been a strong tendency toward the practice 
of promoting most children from grade to grade largely on 
the basis of age and attendance. Under such conditions, dif- 
ferentiation of instruction has been accomplished by classifica- 
tion or grouping within the grade, or by group assignments 
within each class. Consequently the “Jast grade attended’ by 
any child may not be a fair indication of his educational quali- 


371 














EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


fications. In fact almost the only accurate method of appraisal 
of school achievement is by means of standardized tests, the 
results of which are relatively independent of the school 
system, the school, the curriculum, the teacher, and other 
factors such as the policies of pupil adjustment in the indi- 
vidual school or school system. 


A study of achievement among pupils of any grade indi- 
cates a distribution of scores covering a range of from five to 
eight school grades or educational ages. Under these circum- 
stances, a diploma or a grade of school leaving means little 
unless it is supplemented by such information as the teachers’ 
marks in academic subjects, marks in special subjects, infor- 
mation regarding the curriculum, and the classification of the 
pupil, and especially, if possible, marks in standardized tests 
in the basic skills and core subjects. These matters are not 
generally understood by employers and personnel workers. 
Instead they usually condemn the school product in a rather 
general and wholesale fashion, assuming that schools still are 
like those with which they were once familiar. An illustration 
in point may be helpful at this stage: 


In a large manufacturing plant the personnel manager, a 
very capable and wise man, called the school placement office 
for a bus boy to work in a lunch room, clearing away the 
dishes and carrying the trays. A strong body, a willing and 
pleasant disposition, and a reasonably agreeable personality 
were the chief requirements. There was little or no oppor- 
tunity for advancement, and experience had shown that an 
intelligent boy would not remain long at such work. The 
counselor selected a big, strong, sixteen-year-old boy from a 
special class for backward children. He was nominally a 
seventh-grade pupil, but actually his achievements in arithmetic 
and reading were about on the third-grade level. However, he 
was the kind of boy that the teachers would always send on 
simple errands or use as a helper in routine tasks. He was 
cheerful, willing, clean, and from an average home back- 
ground. The counselor explained all of this to the personnel 


372 











sal 
he 
ol 
er 


li- 











EDUCATIONAL REQUIREMENTS AND OCCUPATIONAL LEVELS 


manager, who accepted the boy and reported after the first 
month that he was doing very well. 

One day, however, the office boy in the main office left to 
enter the military service; a second boy was ill; and a third 
had been sent on a long errand. It was necessary to move a 
number of heavy articles from the office and the bus boy was 
requisitioned for the purpose. In the midst of his work the 
general manager called him and sent him on an errand. The 
directions would not have been difficult for an average boy, 
but they were extremely difficult for him. He had to ask 
directions and it became evident that he could not read the 
names of officials on the various doors and some of the other 
strange signs around the plant. When this was reported to 
the manager he roundly condemned his personnel man for 
hiring such a stupid boy, and the school placement office for 
recommending a boy who could not read, and the school system 
for graduating a pupil under such conditions. This man was 
evidently thinking of the schools of a generation ago, and of 
conditions in employment at a time when bright and capable 
high-school graduates were glad of an opportunity to work 
from the bottom up. I am sure that he did not mean to be 
unfair or unjust. 

If personnel officers and counselors are to be realistic and 
truly helpful to young people, it is extremely important that 
they should neither over-estimate nor under-estimate their 
achievements and abilities. An accurate appraisal of educa- 
tional status and achievement is absolutely necessary in order 
to determine the readiness of any individual to enter a pro- 
gram of training at any occupational level. While a rough 
approximation of status and achievement may be obtained 
from the school record, it is at present possible to bring any 
record up to date by means of a battery or inventory of 
achievement and aptitude tests. No single battery will ade- 
quately serve the entire range of individual differences to be 
found among adults. Instead at least five different batteries 
would seem necessary to accomplish the purpose. Even a few 
years ago it would have been difficult to have selected adequate 


373 














EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


test batteries for this purpose, but at the present time there 
are at least five excellent batteries which are available and in 
common use, and there are at least five different groups or 
levels for which these tests may be appropriately used. 

These levels in both educational and occupational oppor- 
tunities are shown in the accompanying chart: 
EBUCAIVGN, ACE, AND DECINMIMNG FORD AT VARIOUS 











3 N OSOCLMAIIOWAL LEVELS in 
. 2 ~ y 
" i) % & 4 a] s N 
a § > § S NS y 8 38 
se € § 8 § 35 :% + § QS 
eR § S 4 <8 $2 88 +8 
x ‘ 7 : 9 & 
Q S £3 £$ sg £$ G& 
| a. 2 ~ = 
T 
x 
2 3 mr : ¥ 
Qo poea > 
3 -_ a ! 6 
3 . f “ } t2 
2s: Zz g N : ' : 
ran y x yr 
e > y 4 
a ee x ot * Ns 
q N N t . 
+ Ls 7] { .4 S=< 
Ze. «46 € &§ 4 . y 8 ; 
S ‘ e 28 ‘ t Ny ‘) 
é : , $s a 37 Ko 
zo__# WS > S 32 Se a 
} Fe: . rs ‘ | - 
ry « pe 7 @ = 
a7 = : a . 7 ) 
: . oN & ' gs v Y ! 
5 $4 & 8 $2 . 
/8 t 4/3 8 ¢ . ' ¥ NY ¢ 1 
{ te 8 al ? ‘ } 
9 $6 a H Q =z v4 
” 2 2 A ' x . 7 « 
S s 
eeu §& ga | = 13 
yj 6 > ' 
: Re Y Rs $} 
; 
“jo | ro eee Lees 
5 al Y a 9 H NN t} Hy 
/@# §$ 7 © 2 x | 0 g ¥ 
3 . 5 | , 38 € 
* ¢ 4 2 d ' 8 o 8 
ss oe « & 3 a X & 8 
» + 8 ¢ $ 8 $> 
12—4 7 %e ° ’ tt * « 
: ' ® a : 
Ve | | x c . § 
“weve 6 3 y & 3 9 
= . 2 ° os 
> q = 
7O fe fs 3 2 UG 
t2 e«@ 3 
9 33 a 2 
33 : 
ets H ° 


Interpretation: A person's present status may be estimated by his /ast 
grade or by his test record. In any occupational level, a person in the highest 
third (A or B) is preferred and can usually obtain work even in slack times. 
A person in the middle third can usually obtain work in normal times. Persons 
in the lowest third may meet the minimum requirements, but can improve their 
chances of getting and holding a job by entering at a lower level and pro- 
gressing through in-service and supplementary training. This procedure solves 
most problems involving discrimination. 

RicHARD D. ALLEN. 
Lester F. Krone. 


374 














‘¥ 


oo -_ 








EDUCATIONAL REQUIREMENTS AND OCCUPATIONAL LEVELS 


The lowest level shows a group of approximately two 
thousand pupils on a seventh-grade level in the Providence 
junior high schools. The figure shows the distribution of these 
pupils in a battery of achievement tests. Both the Metro- 
politan and the Co-operative tests have been used and have 
shown approximately the same range and distribution of 
scores. It is interesting to note that more than a thousand 
adults and young people over eighteen years of age, who 
applied for training as production workers in jewelry and 
novelty manufacture, had a median score in reading, general 
mathematics, and academic aptitude equal to the seventh grade 
in the junior high schools. For these jobs there were practi- 
cally no requirements in regard to education. The work was 
repetitive and did not require even a mechanical aptitude or 
finger dexterity beyond the “D” level. However, all seven 
hundred graduates of the jewelry class were trained and placed 
after less than one month’s training at a minimum wage of 
forty cents an hour. 

The second figure is typical of the distribution of scores 
on the Co-operative tests at the end of the 9A or the beginning 
of the 10B grade. The median non-graduate school leaver in 
Providence has attained a tenth-grade status, has a C— I.Q., 
and ranks C in reading comprehension and general mathe- 
matics. It is interesting to note that in a group of more than 
two thousand men and women who were candidates for train- 
ing in defense classes in Providence, the median grade of 
school leaving was the tenth, and the median scores in academic 
aptitude, reading comprehension, and general mathematics 
were approximately equal to the tenth-grade median. These 
jobs for the most part, in the opinion of employers, required 
ability to read directions and a reasonable mastery of funda- 
mental skill in mathematics. These workers may be classified 
as production workers of the higher level. 

The third figure indicates the results of the Co-operative 
tests on approximately two thousand high-school graduates in 
Providence, showing range and percentile distribution. From 
these pupils are selected those who are admitted to colleges, 


375 








EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


technical schools, apprenticeship schools, and nurses’ training 
schools, as well as many of the more desirable occupational 
opportunities which require high-school graduation. 


The fourth figure indicates the range and distribution of 
scores on the college sophomore Co-operative tests as shown 
in the Co-operative testing program of both local colleges in 
this area and the nation-wide program. Persons at this grade 
level usually make their decisions for specialization in college 
or for entrance into higher skilled occupations on the technical 
and semi-professional level. For instance, such persons are 
preferred for officer training in the armed services. 


The fifth figure shows the range and distribution of scores 
on the college Graduate Record Examination and also among 
the candidates for teacher-training positions in the Providence 
schools on the National Teacher Examinations over a period of 
the past decade. At this level people are selected for entrance 
to the graduate schools and the professions. 

The arrangement of these five figures opposite a scale, 
showing chronological and mental ages at the left and school 
achievement in terms of grades at the right, is for the purpose 
of facilitating appraisal and comparison. The procedure is 
somewhat as follows: Indicate by a check mark in the first 
column the person’s chronological age and by a cross his mental 
age. In the second column indicate by a check mark his last 
school grade and by a cross either his average scholarship 
rank as shown by his school record or the battery of achieve- 
ment and aptitude tests at that level. This may be done by a 
horizontal comparison of values indicated in the appropriate 
figure. Then draw a horizontal line from the cross on the 
grade measure intersecting the figures representing occupa- 
tional levels. Frequently this line will cross two or even three 
figures. If the line does not fall in the upper half of the figure 
it is possible, and even probable, that the worker will find it 
easier and more profitable to attempt to meet the requirements 
on the lower level in which his qualifications will give him a 
preferred status. 


376 





ad: 
hel 
sec 


me 
1 

he] 
sta 


$O 


qu 
pr 
we 
wi 
an 


os WwW 


_ nro ~- DW 


ng 
1al 


re 


Oo mm om IG 


—~ 


\¥ 











EDUCATIONAL REQUIREMENTS AND OCCUPATIONAL LEVELS 


This chart is to be used only in determining the most 
advantageous place of entrance into an occupational level. It 
helps to get a person on the pay roll on a realistic basis. The 
second step in the process is to work out a program of supple- 
mentary and in-service training and experience which should 
help the individual to improve his educational and occupational 
status and thus to earn promotion, after which he may, if he 
so desires, transfer to a job in a higher occupational level. 

The general use of objective methods in appraising the 
qualifications of applicants and the use of such devices as the 
present chart for purposes of comparing different levels as 
well as in determining the status of a person within any group, 
will be an important step in promoting equality of opportunity 
and preventing favoritism and group discriminations. For 
instance, a relief worker was assigned to a technical research 
project because he had attended a college for two years and 
expressed an interest in such work. He liked the prestige of 
his assignment and refused other kinds of employment despite 
the fact that no such jobs were available to one of his quali- 
fications. Tests showed that his present educational skills 
were about those of the average eleventh-grade pupil and 
among the lowest 3 per cent of college sophomores. When 
shown his results on the chart by the counselor, he decided to 
enter a defense training class and now has a good job as a 
production worker. His social worker was also glad to have 
the support of objective data based upon the measurements of 
abilities of the worker rather than upon his observations or 
opinions. 

A similar instance was that of a star athlete who grad- 
uated from a high school with the respect of both faculty and 
student body. He wanted to become an apprentice machinist 
and felt that he was rejected because he was a Negro. His 
school record showed C’s and D’s in most subjects. Moreover, 
these marks were obtained in non-college subjects and in the 
slow-learning class sections. His scores in aptitude tests and 
in the Cooperative tests in English, Social Studies, Mathe- 
matics, and Science all placed him in the lowest fifth of the 


377 

















EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


class. On the basis of such evidence he could be shown that 
many non-Negros would also be rejected even with much bet- 
ter qualifications. However, his qualifications would easily 
admit him to a defense training class and his excellent char- 
acter and personality record, as well as his physical assets, 
would make him desirable as a production worker in the same 
plant where he had been rejected as an apprentice. Moreover 
his pay as a production worker would be much higher than as 
an apprentice. In addition, if he still wanted to become a 
machinist, he could enroll in free evening courses in mathe- 
matics, drawing, science, and machine shop practice, and when 
he had mastered the necessary fundamentals and skills, he 
could be examined and certified by the school authorities, by 
the state civil service, or by the state director of apprentice 
training, and on the basis of such evidence he should be able 
to obtain employment as a machinist. In recent years the 
placement office has had requests from a number of employers 
for at least a few Negros of superior qualifications to indicate 
that they have abandoned the practice of race discrimination. 
They were unwilling to employ them or to reject them because 
they were Negros, but would be glad to employ them if they 
could demonstrate that they were as well or better qualified 
than others who were being employed. 

Accusations of prejudice and discrimination are the per- 
petual alibis of the unsuccessful candidate for any job. The 
only effective answer is the more general use of objective 
methods in determining the qualifications of candidates and 
more objective and accurate production records in determining 
promotions. 





sily 
lar- 
ets, 
me 
ver 
| as 


1en 











THE PREDICTION OF SUCCESS OF STUDENT 
ASSISTANTS IN COLLEGE LIBRARY WORK 


GRACE M. OBERHEIM 
Iowa State College Library 


HERE are many problems which arise in connection with 
the selection and use of student assistants in college 
library work. The specific problem which led to this study was 
the difficulty of obtaining student assistants in the Loan De- 
partment of the Iowa State College Library who were capable 
of doing the required work successfully. It was thought that 
high scholastic grades and high scores on certain selected tests 
would have a positive relationship to successful work in the 
library. 
A testing program was set up at the Iowa State Colleg 
Library during the college year 1937-38. The purpose of this 
program was to discover the extent to which academic grades 
of student assistants and scores made on certain selected tests 
might be used to predict success in various types of college 
library work. The results reported here are based upon data 
obtained for 307 undergraduate student assistants’ who 
worked in all departments of the college library during the 
college years 1937-38 through 1939-40. The predictive indices 
available for this group included the American Council on 
Education Psychological Examination scores, the National 
Institute of Industrial Psychology Clerical Test (American 
Edition) scores, and the grade-point averages for one quarter 
of college work. In addition scores on the Bell Adjustment 
Inventory were available for 69 assistants who were included 
in the group of 307. 
All students take the Psychological Examination when they 


1The group was composed of 174 freshmen, 86 sophomores, 37 juniors and 
10 seniors. Of this number, 71 were women and 236 men. One hundred and 
thirteen students (40 women and 73 men) had had some previous library experi- 
ence at the time they took the Clerical Test while 194 (31 women and 163 men) 
had had no previous library experience. “Library experience” as used in the 
study may be defined as more than four weeks of work, usually on a part-time 
basis, in a library. An assistant who has worked four weeks or less was con- 
sidered in the group “without library experience.” 


379 

















EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 





enter college. The grade reported in the study was the grade- 
point average” made by the student assistant for the quarter in 
which he took the Clerical Test at the library. The library 
rating was made at the end of the same quarter. 

The criteria of success for student assistants in college 
library work were (1) ratings made by librarians who super- 
vised the work of assistants and (2) records of student pro- 
motions within the library. 

The graphic rating scale used was adapted from one de- 
scribed by Filer and O’Rourke (3) and by Symonds (4, 
66-68 ), and instructions similar to those described by Symonds 
were given to each rater. The staff member best acquainted 
with the student’s work and directly in charge of it rated the 
assistant. The ratings were scored by assigning numerical 
values to the five different divisions of the rating line, with a 
possible range from 0 to 4 on each division and a total range 
from 0 to 40 on the ten items. 

Only one rating was used in the study since it was found 
that very often there was no second person equally competent 
to make the rating. As a measure of the reliability of the 
ratings used, a second rating was made for two smaller groups 
of assistants included in the group of 307. Twenty students 
in the Catalog Department were rated independently by a 
second rater, and twenty students were rated by the Assistant 
Loan Librarian as well as by the Loan Librarian. Reliability 
coefhicients of .76 and .77 were obtained. A frequency distri- 
bution of the total scores made on the ratings for the 307 
assistants was made, and the chi-squared test indicated no 
significant departure from normality. 

To be promoted, an assistant must not only have had some 
experience in the library, but he must have also the ability to 
perform more difficult tasks than those assigned to him when 
he began his library work. Student assistants who prove to be 
accurate in their work and who’ have the necessary personal 
qualifications are eligible for promotion. The records of stu- 


“See Iowa State College Catalog, 1939-40. p. 118. 


380 





il 


= 





ade- 
‘rin 
"ary 


lege 
er- 
ro- 


de- 
(4, 
ids 
ted 
the 
cal 


ge 


nd 
nt 











PREDICTION OF SUCCESS IN COLLEGE LIBRARY WORK 


dent assistants who were promoted were obtained from the 
pay rolls. 

The statistical methods used in analyzing the data in- 
cluded a study of the significance of differences of means and 
of the significance of correlation coefficients, and the use of 
regression coefficients. 

Relationship between the Predictive Variables 
and the Library Rating 
Table 1-A shows means and standard deviations of the 
TABLE 1—A 


MEANS AND STANDARD DEVIATIONS ON FOUR VARIABLES 
FOR TOTAL GROUP OF STUDENT ASSISTANTS 














1 eR A A.C.E. Library 
No. ity 2 Grades rs. Rating 
Means 307 73.6 2.3 98.2 24.3 
S. D. 307 19.59 69 22.09 6.27 
TABLE 1—B 


COMPARISON OF MEANS ON FOUR VARIABLES FOR GROUPS OF STUDENT 
ASSISTANTS CLASSIFIED ACCORDING TO LIBRARY EXPERIENCE AND SEX 




















NEES. A.C.E. Library 

Group No. A Grades Ps. Rating 
Men with 
library experience 73 74.7 2.434 101.9 25.4 
Men without 
library experience 163 69.8 2.274 94.1 22.9 
Difference 4.9 .160 in” Ye a 
Women with 
library experience 40 83.2 2.435 107.8 26.5 
Women without 
library experience 31 78.5 2.351 98.7 26.1 
Diiference 4.7 .084 9.1 4 
Women with . 
library experience 40 83.2 2.435 107.8 26.5 
Men with 
library experience 73 74.7 2.434 101.9 25.4 
Difference 8.5* 001 5.9 1.1 
Women without 
library experience 31 78.5 2.351 98.7 26.1 
Men without 
library experience 163 69.8 2.274 94.1 22.9 
Difference 8.7* 077 4.6 a 





(*Indicates that the difference is significant) 


381 














EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


four variables for the total group of 307 student assistants. 
The comparison of means in Table 1-B indicates that although 
the women with library experience made higher scores on the 
four variables than the women without library experience, the 
mean differences were not significant. Men with library ex- 
perience made higher scores on the four variables than men 
without library experience and the mean differences for the 
Psychological Examination and the Library Rating were found 
to be significant. Women with library experience made higher 
scores on tests and grades and were given a higher rating than 
men with library experience, but the mean difference was found 
to be significant for the Clerical Test only. Women without 
library experience made significantly higher scores on the 
Clerical Test and were given a significantly higher rating than 
men without library experience. 


Table 2 shows the correlation coefficients for the three in- 
dependent variables with the library rating for the total group 
and for the subgroups. The correlation coefficients are positive 
but not high. 


TABLE 2 


CORRELATION COEFFICIENTS OF THREE VARIABLES WITH LIBRARY RATING 
FOR GROUPS OF STUDENT ASSISTANTS CLASSIFIED ACCORDING 
TO SEX AND LIBRARY EXPERIENCE 








First-order correlation 








: coefficients 
the Multiple 
cited Group NLLP.CT. Grades ACEPE.  R 

Women with 
library experience 40 .205 .256 .293 
Women without 
library experience 31 249 479* 311 
Men with 
library experience 73 487* .250* 112 
Men without 
library experience 163 aon” .424* .246* 
Total Group 307 .393* wer .263* 456* 





(*Indicates that the correlation coefficient is significant) 


382 








\| 


IAltsli ald! 





ants. 
ough 
| the 
the 
ex- 
men 
the 
und 
her 
han 
ind 
out 


the 


lan 


up 
ve 





PREDICTION OF SUCCESS IN COLLEGE LIBRARY WORK 


TABLE 3 


CORRELATION COEFFICIENTS OF FOUR VARIABLES WITH LIBRARY 
RATING, AND INTERCORRELATIONS 
(Sample Fall Quarter 1939, N = 69) 


























Psychological Clerical Adjustment Library 

Variable Exam. Test Inventory Grades Rating 
Psychological Exam. .698 -.020 570 417 
Clerical Test -.070 501 622 
Adjustment Inventory .0001 .030 
Grades 468 





Table 3 shows correlation coefficients and intercorrelation 
coeficients for a group of 69 students who were given an 
additional test, the Bell Inventory, during the Fall Quarter 
1939. In the hiring and selection of student assistants in 
college library work, the measurement of ability is of prime 
importance. However, assistants must be able not only to do 
the work assigned, but they must also be able to get along 
with people and to have certain characteristics such as co- 
operativeness and dependability. Bell (1, 102-104) and Tyler 
(5) have reported very low correlation coefficients between 
the Adjustment Inventory and the Psychological Examination 
and the Inventory and grades. Since the rating scale used in 
this study contained items such as co-operativeness, initiative, 
and dependability, it was thought that a higher correlation 
coefficient might be obtained with the library rating than with 
the measures of intelligence. However, the correlation co- 
eficient of .03 between the Adjustment Inventory and the 
Library Rating is so low as to be of no value as a predictive 
device for the selection and hiring of assistants. Negative 
correlations were obtained between the Adjustment Inventory 
and the Psychological Examination, and between the Adjust- 
ment Inventory and the Clerical Test. Bernreuter (2) points 
out that this type of inventory should not be used in selecting 
individuals for jobs since the complete co-operation of the 
individual is essential and this complete co-operation is difficult 
to obtain in a program which involves selection for jobs. 
Further research is needed to discover some instrument which 


383 

















EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


will measure more satisfactorily the personal qualifications of 
the applicant. Very often it is possible to obtain through a 
carefully conducted interview an estimate of personal qualifica- 
tions which would make for or prevent the success of an 
applicant. 

Standard regression coefficients for the total group and 
for the Fall Quarter 1939 group are shown in Table 4. These 
TABLE 4 
STANDARD REGRESSION COEFFICIENTS SHOWN FOR THREE PREDICTIVE 
VARIABLES FOR THE TOTAL GROUP AND THE FALL 

QUARTER 1939 GRouP 











NILPL A.C.E. 
Group No. Me 3 Grades FE. 
Total 307 305 .265 -.047 
Fall Quarter 1939 69 593 .244 ~.135 





standard regression coefficients indicate that the Clerical Test 
contributed most to the library rating with grades second and 
the Psychological Examination third for the total group and 
for the subgroup, Fall Quarter 1939. 
Relationship between Predictive 
Variables and Promotions 

While student assistants who make average scores on tests 
or average grades and are given an average rating may get 
along fairly well in the library, the average does not represent 
a satisfactory goal. There is constant need for students who 
can do better than average work, and devices which will help 
to indicate the more promising assistants at the beginning of 
their work are important. It may be seen from Table 5 that 

TABLE 5 
SIGNIFICANCE OF MEAN DIFFERENCES ON FOUR CRITERIA BETWEEN THE 
GROUP PROMOTED AND GROUP NOT PROMOTED 

















i RaR A.C.E. Library 

Group No. a. Grades P.E. Rating 

Promoted 42 88.26 2.76 112.38 29.12 

Not promoted 265 71.23 2.26 95.98 23.50 

Diff. of Means 17.03 50 16.40 5.62 

S. D. of Diff. of Means 3.25 11 3.77 1.04 
eee eee 5345 43 54 


S. D. of Diff. of Means 











tl 
al 


fi 





Ss of 
rh a 
hca- 
an 


and 
iese 


IVE 











PREDICTION OF SUCCESS IN COLLEGE LIBRARY WORK 


the means of the group promoted are higher on all the vari- 
ables than the means of the group not promoted. Not only 
are the means higher but the difierences between means were 
found to be highly significant. 

Obviously not all of the students promoted were equally 
capable. Some of the scores on tests and grades of certain 
assistants in the group promoted fall below the average of the 
group not promoted. By eliminating the lower 25 per cent of 
scores made on the Clerical Test, the Psychological Examina- 
tion, and grades for the group vecaneted, all scores below the 
average of the group not promoted were eliminated. The 
score on each test and the grade which divide the upper 75 
per cent of the group promoted from the lower 25 per cent 
of the group was taken as the critical score and critical grade. 
The critical score is 76 for the Clerical Test and 104 for the 
Psychological Examination (1938 edition) and the critical 
grade is 2.38. 

Since the results show that mean differences between the 
group promoted and the group not promoted are significant, 
high scores on the tests used and high grades may be con- 
sidered to be of value as predictive devices in the ition of 
student assistants for college library work. The critical scores 
determined for this group ‘may be used as a guide in selecting 
assistants in the future who would have high promise of 
success. 


REFERENCES 
1. Bell, Hugh M. The Theory and Practice of Personal Counseling, 
with Special Reference to the Adjustmént Inventory. (rev. ed.) 
Stanford University Press, 1939. 
Bernreuter, R. C. ‘“The Present Status of Personality Trait Tests,” 
Educational Record, X XI (1940), Supp. 160-171. 
3. Filer, H. A. and O’Rourke, L. J. “Progress in Civil Service Tests,” 
Journal of Personnel Research, 1 (1923), 484-520. 
4. Symonds, Percival Mallon. Diagnosing Personality and Conduct 
New York: Century Publishing Co., 1931. 
5. Tyler, Henry. “Evaluating the Bell Inventory,” Junior College 
Journal, VI (1936), 353-357. 


385 


N 

















SE 








THE ADMINISTRATION OF GROUP TESTS 


ERNEST M. LIGON 
Union College 


‘“ NYBODY with a strong voice who can read can give 

group tests.” Unfortunately this opinion is very widely 
held. Even a superficial consideration of the responsibility of 
a group test situation should quickly dispel such an idea. 
Actually, good group testing is much more difficult than indi- 
vidual testing. The most perfect tests available are as value- 
less without good examiners as the best surgical instruments 
without good surgeons. 

Among the prerequisites of good group testing are: that 
all of the subjects understand the instructions, that they all 
work throughout the assigned time at their optimum level of 
achievement, that they are in no way helped, hindered, or 
distracted by one another, that they do not quit trying or omit 
any section of the test, that examiners give instructions ade- 
quately and in a stimulating, effective tone of voice—not a 
dull, bored monotone—and that proctors are observing every 
movement in the group, stimulating lagging souls, inhibiting 
wandering eyes, and detecting failure to follow instructions. 
Literally millions of group tests are administered every year. 
On scores derived from them an equal number of judgments 
are made affecting in some way, often extensively, the lives of 
those taking them. 

An examiner giving an individual test can easily determine 
how his subject is reacting to the test problems. A group test 
examiner has that responsibility for as many subjects as are 
in the room, which may range from only a few to several hun- 
dred. Now that group tests are being called upon to play such 
a large role in our war effort, it behooves us more than ever 


387 














EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


give them effectively, as well as to 


construct them carefully and score and interpret them ac- 
curately. Every man who, due to no fault of his own, makes 


to make every ettort to 


a score on one of these tests which does not reflect his true 
capacity may thereby be put in the wrong place. It does not 
require proving that in so specialized an organization as the 
modern mechanized army, a very important part of its success 
depends on getting the right man in the right place. 

This paper has been prepared on the basis of two types of 
evidence. In the first place, the author has had several years 
of experience in administering many different kinds of group 
tests as well as individual tests. During this period, it has also 
been necessary to train many students to do so. In the second 
place, although the literature contains very little either in 
periodicals or in books on measurement concerning this phase 
of measurement procedure, almost all group tests include in 
their manuals of procedure such instructions as seem desirable 
for their administration. A number of these have been 
examined and the principles included in them collected and 
organized into this paper. The ones used were selected simply 
because they were ones with the administration of which the 
author has had wide experience. It seems probable that they 


constitute a fairly representative sample. 


I 
The aim of a group test is to measure differentially a 
group of homogeneous’ individuals with respect to some 
simple or complex variable. 
Basic requirements, if scores are to be significant, presup- 
pose that all subjects 


(1) give their optimum performance, and 
(2) do so for the full period of the allotted time. 


1By mogeneous is meant that group tests are always administered to a 
group selected because all of its members can be measured with respect to the 
variable or variables involved and by means of the test being used. 


388 








: 


EE - 


ee 
— 









THE ADMINISTRATION OF GROUP TESTS 





These, in turn, presuppose for the examiner 

(1) that he make perfectly clear to the subjects what 
they are to do and how they are to do it; 

) that he stimulate them to do their best; 

that he and his proctors note and make adequate 

adjustments for individual deviations, such as 

mental confusion, indifference, impulsiveness, day- 

dreaming, making right responses in a wrong way, 

cheating, and the like, which might destroy the 

validity of test results. 


II 


The most common sources of error in group test adminis- 
tration and methods for controlling them follow. 


WwW do 
~~ 


(1) Misunderstood instructions 

A very common misconception is that if printed instructions 
are read word for word, this is sufficient to hold conditions 
constant. The fact is that, unless the test itself consists of 
instructions, conditions are held constant only when every sub- 
ject starts to work with a complete and accurate understanding 
of what is expected of him. Furthermore, the method of read- 
ing instructions is quite as important as their wording, so far 
as the subjects’ ability to comprehend them is concerned. All 
instruction manuals ought to be marked as to emphasis and 
pauses, as well as carefully worded. Such a practice would add 
substantially to the accuracy of group tests. 

The speed of reading instructions should be a function of 
the speed of comprehension of the subjects. The alert ex- 
aminer, by watching his subjects, can know when his subjects 
understand what he has just said. Some instructions, therefore, 
can be read more rapidly than others, and with some groups of 
subjects more rapidly than with other groups. The time for 
reading should be a little more than adequate for all subjects. 

Enunciation is important, considering that in most large 
groups, such as in the army, several “‘dialects’’ will probably 
be represented. All must understand. 


389 











EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


The auditory acuity of the subjects must be considered. 
Instructions are more certain to be understood if the subject 
can read them silently as the examiner reads them aloud, thus 
emphasizing both visual and auditory cues. Conversely, how- 
ever, the examiner must read them aloud. To ask subjects to 
read instructions silently without oral reading by the examiner 
almost always results in errors in reading and even failure to 
read all of them. Furthermore, as previously indicated, the 
emphases indicated by the reader help in understanding. Visual 
illustration of the mechanism of recording answers will help 
to avoid many clerical errors. 

Delayed reaction instructions ought to be repeated near 
the time during the test when they are to be carried out. In- 
structions about what to do when a certain part of the test is 
reached are almost certain to be forgotten by some subjects 
unless they are reminded of them. 


Proctors should check to be sure that all of the subjects 
do understand the instructions, by seeing, for example, how 
they answer the first two or three questions, usually simple 
ones which all ought to answer. 

In some tests, ability to understand instructions is part of 
the test. If this is to be the case, there should be a great many 
different instructions separately scorable. Otherwise, failing 
to understand them may produce an undistributed minimum. 
Occasionally in screen tests this may be a desirable condition. 
It is also true, at the other extreme, that if instructions are so 
complete as to constitute answers to the test questions, there 
may be an undistributed maximum. It seems unwise, however, 
to include in the time limits of a test nonscalable sets of in- 
structions. 

Added afterthought directions given in the middle of a 
test by an examiner who has not adequately prepared his 
instructions beforehand constitute an important source of 
distraction. 


The amount of practice necessary to make instructions 
clear to subjects will vary among various groups. It needs to 


390 





























THE ADMINISTRATION OF GROUP TESTS 


be adequate for the poorest of the subjects. One hundred per 
cent understanding is necessary if subjects are to be measured 
accurately. 


(2) Careless Errors 

Speed and accuracy are two distinct qualities. Subjects 
should know how much each is to be weighted in the scoring 
of a given test. For example, a score based on right minus 
wrong on a clerical aptitude test can be raised appreciably by 
working very rapidly even though the number of errors is 
thereby increased. However, most employers of clerical work- 
ers would rather have employees who get sixty correct answers 
out of sixty attempted than eighty-five right out of one hun- 
dred attempted, although the score of the latter is higher than 
the former. 

Persuading subjects who finish before the time is up to 
reread the questions and check the answers is a help in avoid- 
ing careless errors. Proctors need to be alert to the possi- 
bility of subjects overlooking large sections of the test. 

Whether or not subjects are to be encouraged to guess on 
multiple choice questions should be standardized. It is com- 
mon procedure for examiners to warn the subjects that a 
wrong guess counts off more than an unanswered question. 
They neglect to add that a right guess counts more than an 
unanswered question. 

Careless errors which are a result of the faults of the 
examiner need to be watched for. Timing errors are the most 
common of these. A full-face second hand is a necessary part 
of a group testing time-piece. Extra pencils need to be at 
hand, so that securing a new one from a proctor does not 
require a large amount of time. If subjects have two pencils 
to begin with, the possibility of this difficulty is decreased. 
Pens should never be used, since the inevitable corrections 
made by a subject become increasingly difficult or even impos- 
sible to interpret. 

Correct filling in of the forms on the front of the usual 
test blank is difficult. These need to be kept at a minimum and 


391 








EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


filled out systematically under instructions from the examiner. 
The date should be stated or written in a prominent place in 
the front of the room. 

Distractions are to be eliminated as far as possible. A 
distraction is a subjective concept. It is whatever distracts the 
subject. Too quiet a room may often be more distracting than 
one which is noisy. Visual distractions are usually more impor- 
tant than auditory ones. People walking by where they can be 
seen, proctors too obvious observation, neighbors turning to 
later pages of the test too soon, some subjects leaving the 
room too early, and excessive materials on desks are a few of 
the more common distractions. 


(3) Low Motivation 


If group test scores are to be adequate measures of what 
they try to measure, they presuppose that the subjects do their 
best for the full time limit of the test. Unless the test meas- 
ures motivation, maximum motivation is a prerequisite for 
accurate scores. There are at least three types of motivation 
which, when characteristic of group test subjects, tend to 
decrease the reliability of the results. 

(a) Sense of inadequacy. Subjects often see that there 
are many problems on the test which are impossible for them 
to answer and infer therefrom that they are failing the test 
and so give up without trying. This is due to their experience 
with school examinations, in which they are expected to know 
all of the material asked for. A statement of the nature of 
mental tests will often remove much of this misconception, 
emphasizing especially that a good test must be long enough 
and sufficiently difficult that the best subject cannot make a 
perfect score. Then, too, a statement to the effect that too 
high a score is quite as bad as too low a score for a subject 
will help, emphasizing the fact that the accurate score is the 
only good one. This source of error is especially characteristic 
of achievement tests in fields in which the subject has had no 
formal training, such as mathematics and science, if that is the 
case. The subject often gives up without trying, whereas a 


392 








Lane 


Tr 


sw a ae 








1¢e 





+ eenpmren 





THE ADMINISTRATION OF GROUP TESTS 


genuine effort would often produce astonishingly good scores, 
even if the subject has had few, if any, formal courses in 
these fields. 

(b) Sense of indifference. Many subjects may have the 
idea that the test is not important and that it does not make 
any difference what they make on it. There may be initial in- 
difference, or a lack of enthusiasm for even starting a test, and 
there may be executive indifference, or a decrease in enthusi- 
asm as the testing period progresses. 

Overcoming initial indifference depends on (a) the atti- 
tudinal preparation of the subjects for the test, and (b) the 
attitude inspired by the test examiner in the beginning of the 
test. 

As subjects are prepared for a test, they should be told the 
purpose of the test; what it tests and how one can know that 
it tests it. A brief statement is often very effective, pointing 
out that tests are constructed by experiment and not by arm- 
chair theory and should be criticized only when experimental 
data are available. This is especially important with highly 
intelligent subjects. A discussion of the nature of direct and 
indirect tests, with a clear statement of which type is being 
administered, is valuable. If a subject knows that he cannot 
predict the right answer and that he may only destroy his 
chances of getting a good score by endeavoring to do so, he is 
less likely to try to answer the questions in whatever way he 
thinks may get a good score instead of giving straightforward 
answers. 

The subject should also be informed as to the use to be 
made of the results. If they are confidential, to be given to 
no one except him or with his permission, his attitude is im- 
proved by assuring him of this fact. Honesty and frankness 
with the subjects is almost always an asset in getting good 
motivation. 

Then, too, the attitude of the examiner is important. If 
he by his posture and tone of voice indicates that he is bored 
by the whole procedure, he will probably inspire this same 


393 








EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


attitude in his subjects. If he stands erect and alert while 
reading directions, and speaks with a tone of enthusiasm in 
his voice which suggests that he thinks the test is interesting, 
the effect on his subjects will be appreciable. Just as in every 
individual test every subject is the “most important’ subject, 
so in group tests, every group is the “most important” group. 
A good examiner never lets down. 

The intensity of the examiner’s voice needs to be con- 
trolled. However, subjective intensity is not always measured 
in decibels. It is a well known principle in public speaking 
that to lower the voice both in pitch and loudness is effective 
in getting attention. This probably is due to the fact that it 
is the very opposite of the common procedure. An incise, 
firm, ringing signal “go’’ does much to produce good initial 
motivation. When loud-speakers are used, their value lies in 
the fact that more objective intensity can be gotten without 
greater apparent intensity on the part of the examiner. In 
any case, the attitude and posture of the subjects should be 
one of alert attention at the signal ‘“‘go.”’ 

Overcoming executive indifference is even more difficult 
than overcoming initial indifference. The most important fac- 
tor in this respect, aside from the personality traits of the 
subjects themselves, is the internal nature of the test. In 
young children, every test needs to be put on a game level, 
in order to elicit best efforts. If tests are of any considerable 
length, this also holds true even for those given to adult 
groups. Even when the subjects have the best intentions and 
the most complete awareness of the importance of the results, 
it is dificult not to let down with continuing boredom. Test 
construction and the organization of test batteries ought to 
be based on this inherent factor and provide for the inclu- 
sion of interest stimuli at frequent intervals. ‘Test reliabilities 
would thus be improved. A spirit of competitiveness, if not 
overdone, is of value in executive motivation. It must be 
geared to the type of subject with whom the test is used, but 
is always stimulating when used wisely. 


394 








t! 


ry — —- or wD 








lile 


yn- 
ed 
ng 
ve 


$e, 
al 


ut 
In 


De 


rT ef) 6©h©m 








THE ADMINISTRATION OF GROUP TESTS 


The attitudes of the examiner and proctors, even during 
the times when the subjects are writing and no instructions are 
being given, are still a factor. If they relax and slouch around 
in groups for non-test conversation, this will carry over to the 
subjects. Examiners and proctors who are alert during the 
whole testing period have an important role in the maintenance 
of motivation in their subjects. 

Mental fatigue is largely a product of imagination. Some 
groups become tired after fifteen minutes. Others will con- 
tinue at full speed for several hours. Subjects ought to be 
warned, in view of the importance of the results, about the 
shortsightedness of allowing fatigue and boredom to decrease 
the quality of their effort. They should be informed of the 
facts concerning the nature of mental fatigue and as to how 
long it is possible for the human mind to persist at a high level 
of effort and efficiency. 


(4) Mental confusion due to too great excitement 


It is possible, especially with some subjects, to get too great 
motivation as well as too little. The subjects need in the very 
beginning to be put at their ease, without a loss of desirable 
motivation. The methods employed by every good individual 
tester can with modification be applied in group testing. Thus, 
an individual tester adjusts the speed and intensity of his voice 
to the speed and intensity of his subject. If the subject is 
dawdling he can speed him up by a slight increase of these 
qualities in his own voice. If the subject is obviously too 
excited, he can quiet him down by the slower speed and lower 
intensity of his voice. This can be done also by the effective 
group tester. : 

Two common causes of overexcitement can be diminished 
by the good examiner. If the subjects worry about the tests a 
long time in advance, it may be wiser not to forewarn them at 
such long periods. No forewarning at all, on the other hand, 
might produce a complete mental disintegration in some sub- 
jects. Instructions which read “work rapidly,” if given with 
too great fervor, may sometimes decrease the efficiency of cer- 


395 














EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


tain subjects. Many people cannot work well under pressure 
of time or being hurried. The calmness and inspired confidence 
of the examiner can instill an alertness in his subjects without 
overexciting them. 

When the instructions or the paraphernalia, such as 
machine-scoring equipment, are too complicated, plenty of 
time needs to be taken to familiarize the subjects with them 
to insure confidence in their use. 


($5) Dishonesty of Subjects 

Observing and copying one’s neighbor’s work is only one 
of the forms of cheating done by subjects on group tests. It is, 
furthermore, the easiest to control. It should simply be made 
impossible. Honor systems should never be used in group 
testing since relative scoring can be made invalid by even a 
few dishonest subjects. But there are other forms of dis- 
honesty. Precheating is very common. Often people come to 
psychologists to ‘“‘get a copy of all the tests to be had” so as 
to be ready for some coming group tests. Obviously, most 
tests cannot be prepared for. But subjects attempting to do 
so show by their attitude their unwillingness to give the most 
desirable type of co-operation. It stands to reason that to 
whatever extent a subject succeeds in this sort of effort, he 
destroys the value of the test to him as an accurate indication 
of his ability or aptitude. This type of cheating can best be 
eliminated by the process of urging upon the subjects the fact 
that a high score is a bad score unless it is accurate. 

Getting information concerning the tests from individuals 
who have taken them is another dishonesty source of error. 
Subjects ought to be informed of the possible consequences 
which may arise from such action. If, for example, by this 
means one succeeds in getting into the air corps who would 
have been rejected if properly tested, and is killed in training, 
his informer can hardly be thought of as having done him a 
favor. 

Such test procedure methods as involved in requiring 
“pencils up”’ between tests and “folding back booklets so that 


396 








onl 
err 
wh 
to 

bot 


Sor 
Orl 
qué 
at 

ot 

giv 
rea 


cel 
his 


to 


th 

















THE ADMINISTRATION OF GROUP TESTS 


only one page at a time is visible” are designed to eliminate 
errors belonging in this classification. Clear instructions as to 
whether it is permissible to proceed to another page or go back 
to a former page without further signal ought to be given 
both orally and on the printed page. 

Many subjects get help from the examiners themselves. 
Some examiners indicate correct answers in their tone of voice. 
Others will answer individual questions which may give the 
questioner an undue advantage. Such questions, if answered 
at all, ought to be answered so that all can hear. An example 
of how the examiner’s voice can help the subject is found in 
giving the digits test in the Binet. If the digits are grouped or 
read too rapidly the test is much more easily passed. 

It is well for the examiner to have a correct attitude con- 
cerning the nature of his job. The job of a teacher is to help 
his students learn. The job of an examiner is to measure, not 
to teach. 


(6) Not working full time 


If time is a factor, a test should be so constructed that 
the best subject cannot finish within the prescribed time limit. 

It is common procedure in achievement tests to give ade- 
quate time for even slow subjects to finish. This results in 
the best subjects finishing early. If they leave the room, this 
becomes a serious distraction to the slower ones. It might be 
desirable to include non-scorable questions to prevent this from 
happening. It is true that holding subjects after they have 
finished is usually not good for testing morale. 

It is difficult on long tests for subjects not to let their 
minds wander from time to time. This is a factor for the 
proctors to deal with. If the proctors are alert, both by their 
presence and their active efforts, they can keep the subjects 
working consistently. The discussion of executive indifference 
is also related to this point. Subjects, of course, should know 
whether or not it is a timed test, and in the case of long tests 
should be warned at regular intervals as to the amount of time 
consumed or remaining for the test. 


397 








EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Another factor which enters into the timing problem of 
tests is that of mental set. When long tests covering entirely 
different types of material are given successively, a longer 
interval needs to elapse between them. This is not due pri- 
marily to a fatigue factor, but to the need for changing the 
mental set from one field to another. Teachers who have 
taught two different courses in successive hours will recognize 
the importance of this principle. 

(7) Size of group and group inter-distractions 

Good morale in a group is essential to maximum perform- 
ance. How large a group can be before distracting factors 
enter in due to size, varies. One group of four hundred can be 
tested better than another group of fifty. The constitution of 
the group is a factor, as is the ability of the examiner. When 
the members of the group do not know each other, morale is 
more easily maintained than when they do, unless intergroup 
rivalries can be used as a motivation. Groups of older age 
levels are usually easier to control than younger groups. 
Groups competing with each other have better morale than 
those having no such sense of group solidarity. Telling a new 
freshman class or group of draftees that they are competing 
with preceding classes or groups is an incentive for group 
morale. 

Once group morale is lost, it is very hard to regain. Let 
there be a few sighs, whistles, groans, shufflings of feet, low- 
intensity grumblings, or catcalls and the situation for good 
group testing is almost hopelessly lost. The leadership of the 
examiner and the alertness of the proctors will play a large 
part in this. 


[il 


This paper has attempted to indicate the difficulties in- 
volved in the administration of group tests and to point out 
some methods for making them good measures of the variables 
involved. Every subject ought to leave the testing room feel- 
ing confident that he has done his best, and that the score 


398 











yer 
ri- 
he 


ve 


ze 

















THE ADMINISTRATION OF GROUP TESTS 


assigned to him will be representative of him, even if he has 
missed a large percentage of the questions. One does not pass 
or flunk tests any more than he passes or flunks measures of 
height and weight. It is the task of the examiner to make 
this clear to each subject and get from him a sample of his 
best performance. More thorough training of group testers 
and a larger sense of their responsibility among them will 
make the increasing use of group tests a far greater contribu- 
tion to the problems of adjustment than if the common notion 
that ‘“‘anybody can give group tests if he reads the printed 
instructions word for word”’ continues to be the prevalent one. 
It will be obvious that not all of these principles will be ap- 
plicable to all group tests, but it should be equally obvious 
that administering any group test is difficult and, when well 
done, constitutes a highly skilled act. 





THE PURPOSE, ORIGIN, PLAN OF PROCEDURE, 
AND VALUES OF THE NATION-WIDE EVERY 
PUPIL SCHOLARSHIP TESTS* 


































H. E. SCHRAMMEL 
Kansas State Teachers College 


Purpose 


N the field of measurements and the objective testing move- 
I ment, the Nation-WVide Every Pupil Scholarship Test is 
one of the major significant developments. Because of the far- 
reaching influence of these testing programs in this respect, it 
was felt that it would be worth while to recount before the 
membership of the National Association of Teachers of Edu- 
cational Measurements the major details of their purposes, 
origin, methods of procedure, and values. 

The purpose of the Every Pupil Scholarship Test is the 
promotion of scholarship. They are a valuable agency for 
stimulating scholastic endeavor on the part of the students. 
They stimulate good teaching as well as application to better 
learning. They vitalize education and make schools more 
worth-while in the lives of the students. 


Origin 

~ The Nation-Wide Every Pupil Scholarship Tests spon- 
sored by the Bureau of Educational Measurements of the 
Kansas State Teachers College of Emporia had their origin 
twenty years ago in connection with the county and state 
Scholarship Contests sponsored by this college. 

The first county contest in academic subjects, of which we 
find a record, was conducted by the Bureau of Educational 
Measurements in 1922 in Cloud County, Kansas. The first 









*Paper read at meeting of February 24, 1942, in San Francisco. 


401 














EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


State Scholarship Contest on record was conducted by the 
Emporia State College in 1923. 


For a time the county contest movement was very popular, 
and the state contest movement also developed at a marked 
rate. The latter is still a popular event in Kansas. This spring 
the twentieth annual State Scholarship Contest will be con- 
ducted by the Emporia State College at thirty conveniently 
located centers of the state. Last spring over 3,500 students 
from approximately 200 high schools participated in this 
event. 


In the county contests at first only a few of the best pupils 
participated from each school. Hence the suggestion was made 
that a plan be devised which would stress excellence in achieve- 
ment of the entire class in a curricular field. Thus in the spring 
of 1924 two schools conducted a contest in one subject which 
involved a larger number of pupils from each school, each 
set of pupils taking the tests in their own school. The test 
papers were provided and scored by the Emporia State Col- 
lege. This was known as a dual contest. During the 1924-25 
school year there was much demand for objective tests for use 
in similar inter-school contests in which every pupil in one or 
more specified subjects of each of the competing schools par- 
ticipated. Because of the increased demand for new tests for 
this purpose, a plan was devised for announcing in advance 
the subjects and dates for which tests would be made available 
for inter-school competition. During the first year that this 
plan was in operation, many schools used the tests for inter- 
school competition in which all the pupils of each school par- 
ticipated and the median score was used as the measure of 
comparison. A few schools, however, were not matched with 
any other schools for competition, but they desired to use the 
tests in order to be able to compare their results with the re- 
sults of the other schools for the purpose of determining the 
relative excellence of their own classes. Hence norms were 
computed from all scores in each subject and provided to all 
the participating schools. Thus the Every Pupil Contest idea 


402 











ws | 


—- ha a2 oe = Ge 





the 


ced 











PLAN OF PROCEDURE OF NATION-WIDE EVERY PUPIL TESTS 


soon was superseded by the principle of a testing program in 
which schools voluntarily participate in order to obtain an 
objective measure of the attainment of pupils and classes. This 
is the plan that has been retained in the main, with the intro- 
duction from time to time of valuable perfections and improve- 
ments. 


Plan of Procedure 


At present the plan of procedure of the Nation-Wide 
Every Pupil Tests is as follows. The Bureau of Educational 
Measurements annnally announces two dates for the testing 
programs. These come at the close of the first semester and 
near the middle of April. Bulletins are sent out giving the 
list of subjects for which new tests will be provided for each 
testing date. This year thirty-four new tests were provided 
for the testing program scheduled for January 8, and forty- 
four tests will be provided for the next testing program an- 
nounced for April 8. Approximately 1,000 schools of the 
country obtain tests at mid-year and 1,500 for the end-of-year 
Test. About three-fourths of a million copies of the tests are 
used annually. 


The Bureau secures competent volunteers to construct the 
tests. These consist usually of teachers in Kansas and else- 
where who are well trained in their respective curricular fields 
and who have also had some training in the field of measure- 
ment. The tests are edited at the Emporia State College by 
test construction and curricular specialists. The printing is 
done in the college print shop. Several dozen student assistants 
are employed in the office of the Bureau and in the print shop 
to handle the routine duties of typing, proofreading, filling 
and shipping orders, summarizing scores, computing percentile 
norms, invoicing, and keeping accounts. 


As test orders are received from all parts of the country, 
norms are computed from the scores reported by the partici- 
pating schools for each curricular field both for the whole 


403 














EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


group and also separately for individual states from which a 
sufficient number of scores are reported to warrant it. 


A summary bulletin of results is printed in compact form 
and furnished gratis to all participating schools within three 
weeks after the scheduled testing date. 


For one of the recent Every Pupil Scholarship Testing 
Programs it was found that the process of computing the 
measures reported in the Summary Bulletin of Norms entailed 
the handling of 162,412 pupil and class scores, the construc- 
tion of 405 frequency tables, and the calculation of 3,429 
statistical measures. The norms computed are based on from 
several thousand to over ten thousand pupil scores for each of 
the various school subjects and grades for which the tests are 
provided. 


Validity and Reliability 


What method is used to insure that the tests possess ade- 
quate validity and reliability, the most important criteria for 
evaluating tests, is a question worthy of consideration at this 
point. While it is not claimed that the tests are fully stand- 
ardized, they do compare favorably in these respects with the 
better standardized publications. 


For insuring validity the following precautions are taken: 
First, as a rule the test builders are persons who teach classes 
in the curricular fields covered by the tests and who therefore 
have a good perspective of the content to be included. Second, 
content studies are made of textbooks and courses of study and 
the test items budgeted in accordance with the content distribu- 
tion. Third, the editors consist of test construction specialists 
and supervisors and teachers of curricular fields. Fourth, 
cumulated studies of pupil responses on test items over a period 
of years are available and used. Fifth, cumulative criticisms 
from teachers who have used the tests over a period of years 
are available and utilized. Sixth, in fields where studies from 
previous editions of the tests are not available, preliminary 
editions are provided and tried out in representative classes. 


404 








ed 
fie 


bil 


pu 
as 


FI 














PLAN OF PROCEDURE OF NATION-WIDE EVERY PUPIL TESTS 


For insuring reliability, studies are made with preliminary 
editions and on tests provided over a period of years. In a 
field where tests are regularly provided, the degree of relia- 
bility may thus be predicted with a fair degree of accuracy. 

The results are used extensively by teachers, principals, and 
pupils. Many expressions are annually received from schools 
as remote from the center of this movement as Montana, 
Florida, Texas, Maine, and California. Teachers are eager to 
learn how their classes rank in comparison with the classes of 
dozens of other schools in which the tests were administered 
on the same day during the current school term. Moreover, 
they want this information without much delay. It must be 
available promptly during the current school year to be of 
maximum value to them. Thus far we have been able to live 
up to the goal of mailing the results to the schools within 
three weeks from the day the tests are administered. 


. 


Pupils, too, are eager to note how they rank in comparison 
with the pupils in their own and other schools and they want 
this information before it becomes ancient history. By pro- 
viding objective measures which can be simply and intelligently 
interpreted, pupils are motivated to work for greater excel- 
lence in achievement in the various curricular fields. 


Many principals conserve the test results from year to year 
by filing the cumulative record of each pupil on a convenient 
card which has been provided. Because all scores are similarly 
interpreted, this provides a wealth of material for use in coun- 
seling and personnel work. Some schools also issue certificates 
of excellence to pupils whose scores receive a high percentile 
rank. In this manner excellence of achievement is further 
stressed and motivated. 


Values Accruing from the Plan 


The values accruing from the Every Pupil Scholarship 
Tests are manyfold. These may be roughly classified as pri- 


405 








EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 





mary and as secondary values. The major primary values are 
the following: 


A. 





The plan of Every Pupil Scholarship Tests stimulates 
an intelligent interpretation of test results. For this 
purpose, percentile scores are provided for each subject 
and grade. Simple instructions are given for interpret- 
ing class median scores, as well as individual scores, 
into corresponding percentile scores. Because many 
different methods of interpreting standard test scores 
are resorted to, teachers are frequently at a loss in 
regard to the procedure in making correct interpreta- 
tions. All too frequently valuable results from the use 
of standard tests are misinterpreted or not interpreted 
at all. Through our process of education in this re- 
spect over a period of years, many teachers and prin- 
cipals have exhibited that they have learned to make 
correct and meaningful interpretations of their results 
by the percentile score method and that they like the 
simplicity of this method. 


. The plan motivates pupil and class effort because the 


results are objective and the interpretation is intel- 
ligible not only to teachers but can be presented graphi- 
cally and intelligently to pupils. 


The plan motivates teachers in the construction and 
use of better home-made tests. 


. The plan motivates teacher effort in better plannin 
p Pp g 


of instruction, finding of weaknesses of instruction, and 
so on. 


The plan motivates diagnosis of weaknesses and 
efficient remedial work in instruction. 


*. The plan challenges the teacher to set up outcomes 


objectives and to look for methods of determining the 
extent to which such outcomes are realized. Too fre- 
quently teachers are content in the assumption that 
valuable intangible outcomes accrue from their instruc- 
tion, when in reality this may be far from the true 
conditions. By being consistently exposed to objective 
measurements, it is hoped that in time they will become 
skeptical of these assumptions and seek to evaluate the 
actual lasting values of their efforts. 


406 











th 


m 





are 


oe I 








PLAN OF PROCEDURE OF NATION-WIDE EVERY PUPIL TESTS 





Among the secondary values, the following are a few of 
the more obvious: 


A. The plan aids the test builders to become more pro- 


B. 


ficient in devising more efficient and valuable tests. 


On the Emporia State College campus the plan aids 
several dozen students annually who need employment 
to finance their college education. 


For the Measurements classes on the campus, the plan 
provides an invaluable laboratory. All of these stu- 
dents are in some concrete measure exposed to test 
production, standardization, use, scoring, interpreta- 
tion, and so on. 


. The plan affords unusual opportunity for teaching stu- 


dents in Measurements classes and other employees the 
use of mechanical devices in handling statistical and 
other data. For example, the Bureau office contains 
hand and electric calculating machines, comptometer, 
clip boards, postal rate scales, postal wrapping device, 
and Dictaphones. The college print shop is equipped 
with linotype, rotary press, folding machine, stapling 
machine, and other equipment essential to a modern 
printing establishment. A large number of students 
receive first-hand experience in the operation of these 
devices in connection with their employment made pos- 
sible by the Nation-Wide Every Pupil Scholarship 
Tests. 


. The plan of the Every Pupil Testing Programs makes 


it possible to standardize more and better tests than 
would otherwise be possible. Where normally scores 
for norms would be difficult to obtain, and at consider- 
able cost, a much larger sampling is possible for the 
norms and at practically no cost. This makes it pos- 
sible to pass the advantages on to the patrons in terms 
of more and better up-to-date tests at an unusually low 
cost of production. During the writer’s directorship 
of the Bureau of Educational Measurements, fifty- 
seven tests have been standardized. Most of these are 
published by the Emporia State College, but a few 
are published by some of the other leading test pub- 
lishers. 


407 








EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


For many schools, participation in the Nation-Wide 
Every Pupil Scholarship Tests has furnished excellent 
material for local school publicity. In this manner tax- 
payers and patrons are made aware that their schools 
are not only seeking to excel in the so-called extra 
school work, but also in the regular curricular fields. 


Through use of these tests, capable pupils, who other- 
wise might be content to terminate their education upon 
completion of junior or senior high school, make a 
discovery of their own scholastic potentialities and are 
inspired to seek further development in college. 

















A TEST FOR SELECTING AND TRAINING 
INDUSTRIAL TYPISTS 


CLIFFORD E. JURGENSEN 
Kimberly-Clark Corporation 


HE Typing Ability Analysis reported here was de- 
§ jen to assist in training typists for industrial posi- 
tions, and to assist in hiring typists who can adequately fill 
such positions. It was developed after tests commonly used 
in high schools and business colleges had been found to be 
valueless in predicting typing success in stenographic and 
secretarial positions in Kimberly-Clark Corporation. 

Analysis of reasons for the failure to predict job success 
by means of customary typing tests indicated that such tests 
do not emphasize sufficiently the major factors found in the 
industrial situation. Most typing tests provide an adequate 
measure of the mechanics involved in speed and accuracy when 
typing from printed copy. They fail to measure the mechanics 
of handling paper, placement of paper, use of tools, etc. They 
also fail to measure the non-mechanical aspects of the job 
which are the major factors in differentiating between suc- 
cessful and unsuccessful typists. Important non-mechanical 
aspects which should be measured include following instruc- 
tions, noting and correcting errors, and the typing of diverse 
kinds of material rapidly, accurately, and.in good form. Usual 
typing tests emphasize straight copying, and neglect the mosaic 
involved in a composite typing job. 


Construction of the Test 


Job analyses were inspected and conferences heid with 
supervisors of industrial typists to determine the kinds of 
typing most often done, errors most often made, characteristics 


409 

















EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


which differentiate between successful and unsuccessful typists, 
etc. Supplementary data were secured by inspection of the 
corporation’s files, and conferences with private secretaries 
and others in key typing positions. After analyzing and classi- 
fying the data obtained, files were inspected to obtain represen- 
tative samples of the various kinds of typing. These samples 
were modified so they would be suitable for a test of typing 
ability, modification consisting primarily in the use of fictitious 
names and addresses. Standard typing procedures were sub- 
stituted for those unique to the corporation concerned, in order 
that persons unfamiliar with procedures within the corporation 
would not be penalized unfairly. 


After collecting and adapting twenty samples of different 
kinds of typing, instructions were prepared for each. Extreme 
care was taken that these instructions emphasize factors pre- 
viously found important in differentiating between successful 
and unsuccessful typists. The preliminary test form of twenty 
items was administered to typists ranging from those known 
to be unsatisfactory to highly successful private secretaries. 
Tests were administered individually, and a record was kept 
of the time required to complete each test part, errors made 
in each part, comments on the test before it was scored, and 
comments of the typist after she had been told how her test 
results compared with those of other persons. This procedure 
was followed with a group of thirty typists half of whom were 
fully experienced high caliber stenographers or secretaries, 
and half of whom were inexperienced typists in the mailing 
department. Unanimous agreement among supervisors ac- 
quainted with each typist was required for inclusion in one of 
these two groups. Although the number of girls included in 
each group was small, the number was considered sufficiently 
large to warrant preliminary modification of the test. As a 
result of this tryout, the entire test was extensively revised, 
and seven of the twenty work samples were eliminated. The 
revised test was administered to a group similar to the one 
first used, the procedure and conditions of administration re- 


410 








on 
tio: 
ha: 
otl 
sel 
ria 


pa 


wl 
ap 
to 
ex 
st 
su 


Pe an ee a. ee 


es at 














A TEST FOR SELECTING AND TRAINING INDUSTRIAL TYPISTS 


maining the same. Six additional work samples were eliminated 
on the basis of low validity coefficients or high intercorrela- 
tions with remaining parts. Accumulation of additional data 
has subsequently resulted in elimination or modification of 
other parts, and the present test consists of five carefully 
selected work samples. One of these is used as practice mate- 
rial so that the final score is based upon four selections. These 
parts are described later. 

Considerable attention was given to developing a test 
which not only has a high validity coefficient, but which also 
appears valid to persons to whom the test is administered and 
to supervisors of such persons. It has been the author’s 
experience that apparent validity is of equal importance with 
statistical validity. A test must have both if it is to be used 
successfully for industrial selection and training. 


Development of Error Scores 


The test was originally scored for errors in such a way 
that the test situation was as comparable as possible with 
actual work situations. Test results were examined from the 
viewpoint of whether or not similar work would be accepted 
by supervisors if submitted by an employed stenographer. 
Penalties for errors varied in direct proportion to the time 
required to make the work acceptable. No penalty was given 
for errors which did not affect the acceptability of the work, 
such as neat erasures. Errors of such a nature that the item 
would have to be retyped in order to be usable were penalized 
in proportion to the time required to type that item. This was 
subsequently modified so that the penalty was in proportion 
to the time required to retype the item inasmuch as it was 
found that the retyping time was not proportional to the 
original typing time. Errors that could be corrected so that 
material could be used in an actual work situation were penal- 
ized in proportion to the time required to make the necessary 
corrections. 

Statistical analyses showed that the use of maximum 
penalties reduced the validity of the test by preventing differen- 


411 











EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


tiation between poor typists. For example, in one test part, 
the testee must alphabetize the material, tabulate in three 
columns, make a carbon copy, etc. A testee was given the 
maximum penalty if she neglected to alphabetize the material; 
others who made all of the errors listed above would be given 
the same penalty as the first girl. Although the error scores 
for these girls would be identical, the quality of work on the 
test item concerned would be far from the same. 

Originally no penalty was made for corrected errors if 
erasures were neatly made. The assumption was made that 
girls making erasures automatically penalized themselves by 
increasing the time required for completion of the test. Statis- 
tical analyses showed, however, that validity coefficients were 
increased by penalizing such corrected errors. 

On the basis of errors made in the preliminary forms of 
the test, an item analysis was made of the seriousness of each 
error. This analysis showed that three classes of errors were 
sufficient for a total error score. The three classes were named: 
(1) corrected errors, (2) minor errors, and (3) gross errors. 
On the basis of the item analysis, the following three defini- 
tions were established to explain the three classes and to assist 
in the subsequent determination of the seriousness of errors 
found so infrequently that they could be given no statistical 
weight : 

Corrected Errors are those which have been cor- 
rected by the typist (e.g., neat erasures). Each unit 


correction (whether it be a letter, word, or phrase) is 
counted as one corrected error. 


Minor Errors are those which are correctible 
(e.g., misspelled words, strike-overs. etc.) or which 
detract from the form, arrangement, or neatness of the 
finished work to the extent that the material is accept- 
able for use but is below the desired standard. 


Gross Errors are those which cannot be corrected 
unless the work is retyped (e.g., failure to make a 
carbon copy, failure to tabulate the material, etc.), 
those which are equivalent to two minor errors, or 
those which result in form, arrangement, or neatness 
below the minimum standard of acceptability. 


412 











rt, 
ree 


the 


ren 
res 


he 


jat 
by 


is- 








A TEST FOR SELECTING AND TRAINING INDUSTRIAL TYPISTS 


Error weights for gross, minor, and corrected errors were 
statistically determined by means of biserial validity coefficients 
for each type of error in a group of 63 employed typists, the 
criterion of success being grade of job successfully filled. Beta 
weights in raw score form were obtained through application 
of the Wherry-Doolittle test selection technique (5) using 
intercorrelations based on 250 applicants for typing positions.* 
In order to simplify scoring procedures, beta weights were 
rounded to the nearest whole number. The total error score 
is the sum of the corrected errors plus two times the sum of 
minor errors, plus four times the sum of gross errors. Data 
are summarized in Table 1. 


TABLE 1 


DEVELOPMENT OF ERROR WEIGHTS 











Type of Biserial Beta Weight Raw Score Rounded 

Error Validity r Z-Score Form Beta Weight Weight 
Gross -.521 —.3421 —.3212 - 
Minor —.639 -.4700 —.1622 2 
Corrected —.430 —.3070 —.0915 1 





Use of Combined Speed-Accuracy Score 

In some cases it is desirable to interpret test scores from 
the two viewpoints of speed and accuracy. Usually, however, 
a combined score is preferable inasmuch as speed is worth 
little if not accompanied by accuracy, and accuracy is worth 
little if not accompanied by speed. Further, any typist can in- 
crease her speed at a sacrifice of accuracy or improve accuracy 
at a sacrifice of speed; therefore a combined score which is a 
function of both speed and accuracy will describe the typing 
performance better than either speed or accuracy alone. 

Usual methods for combining scores (such as converting 
to standard scores or weighting by the reciprocal of the 
standard deviation) cannot be used with these data inasmuch 

1This procedure assumes that the larger group is comparable in all relevant 
respects with the smaller (criterion) group. The multiple correlation obtained 
from intercorrelations based on the expanded group may be larger or smaller 
than that obtained from intercorrelations limited to the criterion group. The 


use of an expanded group, however, has been found to increase test validity 
when the test is used with another group in a follow-up study (4). 


413 








EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


as the distributions are not of the same shape and error dis- 
tributions are far from symmetrical (3). Time distributions 
closely approximate normality and error distributions are 
greatly skewed toward the high error end. This is as expected, 
because one end of the distribution will be zero errors, whereas 
there is no limit at the other end of the distribution. 

Error scores were transformed into “converted errors” by 
means of the best fitting curved line in accordance with Horst’s 
method (3). The resultant distribution, based on 636 cases, 
was normal, and was expressed in terms of a mean and sigma 
equal to that of the time distribution. Total error scores are 
changed into converted error scores by means of Table 2. The 
time score can be added directly to the converted error score 
to obtain a combined score giving equal weight to speed and 
accuracy. Obviously, the time and converted error scores can 
be weighted in any other desired manner. 


TABLE 2 


TABLE FOR CHANGING TOTAL ERRORS INTO 
CONVERTED ERROR SCORE 











T.E. C.E.S. T.E. C.E.S. T.E. C.E.S. T.E. C.E.S. 
0 26 15 69 30 = 86 48-49 98 
1 30 16 70 31 87 50-52 99 
2 34 17. 72 32 88 53-56 100 
3 38 18 73 33-88 57-60 101 
4 42 19 74 34 89 61-65 102 
5 46 20 =75 35 90 66-71 103 
6 49 21 77 3691 72-77 104 
;, = 22 78 37. 91 78-83 105 
8 55 a 6 38 92 84-89 106 
> oF 24 80 39 93 90-95 107 

10 59 25 «(81 40 = 93 96-101 108 

ll 61 26 882 41 94 102-108 109 

12 63 27 = 83 42-43 95 109-114 110 

13 65 28 «= 84 44-45 96 115-120 111 

14 67 29 «85 46-47 97 121-126 112 





T.E.= total errors obtained by 4 x gross, plus 2 x minor, plus corrected 
errors. 

C.E.S.= converted error score which can be combined directly with time 
score. 


414 








—_— = was ge te 





derline it, 








A TEST FOR SELECTING AND TRAINING INDUSTRIAL TYPISTS 


Nature of Test 


The Typing Ability Analysis? consists of five parts, each 
being complete in itself. The first part is not scored, and con- 
sists of typing identification material such as name, address, 
and date. Part Two consists of approximately 150 words of 
a draft of part of an article to be typed. The work copy is in 
typed form, but contains thirty-five errors which are marked 
for correction. Each error is accompanied by the correct form 
which is to be used. Part Three requires the tabulating of 
seven lines in three columns, together with appropriate column 
headings, title, etc. The fourth part consists of a letter ninety 
words in length. The letter is written in longhand and con- 
tains ten changes also made in longhand. Part Five requires 
alphabetizing and tabulating fifteen lines of authors’ names, 
book titles, and publication dates, together with typing column 
headings and title. 

All test parts contain instructions such as, “make a carbon 
copy on yellow paper,” “type the heading in capitals and un- 
” “place your initials and the present date in the 
lower left-hand corner,” etc. Failure to follow directions is 
penalized. In a few cases penalties are made for items not 
specifically mentioned in the instructions; for example, failure 
of the typist to place the date or her initials on the letter. 
Such penalties are made only for fundamental errors and fail- 
ure to follow universal practice as taught in all typing classes 
and required cf all industrial typists. Table 3 contains a list 
of all probable errors in each test part, classed according to 
whether they are scored as corrected, minor, or gross errors. 

The Typing Ability Analysis is a work-limit test, each girl 
being permitted to complete the test. The shortest testing 
time in 636 cases was 26 minutes, and the longest time was 
120 minutes. The average (mean) time required by industrial 
applicants is 61 minutes, and by high school seniors is 78 
minutes. Although these average times are lengthy when com- 


pared with other typing tests, the increased time is warranted 


“Published by Science Research Associates. 


415 








EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Part II—Roucu Drarr 


Not on page 15 

No carbon copy 

Carben not on yellow paper 

Not double spaced 

Omission of word or phrase 
Did not make indicated change 
Strikeover 

Misspelled word 

Poor appearance 

Other error 

Corrected errors 


Part III—TABULATION 


Not on page 13 

No carbon copy 

Carbon not on yellow paper 

No title 

Title not in caps 

Title not underlined 

Headings not underlined 

Not tabulated 

Omit “1941” and/or “1942” 

Omit “Type of Paper” 

Columns in wrong order 

3 or more figures out of 
alignment 

1 or 2 figures out of 
alignment 

Omit word “total” 

No line above “total” 

No line below “total” 

“Total” lines not extended 

No initials 

No date 

Strikeover 

Misspelled word 

Incorrect figure 

Poor appearance 

Other error 

Corrected errors 





* * Gross 


“a KK 


TABLE 3 


CLASSIFICATION OF ERRORS 


Minor 


ad 


wR Rw K 


AK K KKK KKK OK 


Corrected 


Part IV—LETTER 


Not on letterhead 

No carbon copy 

Carbon not on yellow paper 
More than 4%” after last line x 
314” to 4%” after last line 

No date 

No initials 

Omit “encl.” 

Less than 3 lines for signature x 
Did not make indicated change x 
Strikeover 

Misspelled word 

Poor appearance x 
Other error x 
Corrected errors 


* * Gross 


Part V—ALPHABETIZING 


Not on page 9 


No carbon copy x 
Carbon not on yellow paper 
No title x 


Title not in caps 

Title not underlined 

Column headings not underlined 
Omit 2 or 3 column headings x 
Omit 1 column heading 

Not tabulated x 
Less than 3 spaces between 
columns 

or more items out of 
alignment x 
or 2 items out of 

alignment 

Initials precede names x 
Line in wrong order (max: 

2 gross) x 
Line omitted x 
Incorrect publication date 
Incorrect book title 
Strikeover 
Misspelled word 
More than 5 punctuation errors x 
1-5 errors in punctuation 
Poor appearance x 
Other error x 
Corrected errors 


w 


416 





Minor 


” 


a Kew OK 


wy K KK OK 


~~ Kw OK 


“ 


Corrected 























A TEST FOR SELECTING AND TRAINING INDUSTRIAL TYPISTS 


by the high validity of the test. Inasmuch as the test adminis- 
trator does no work after starting the test except to record 
the finishing time of the testee, the time required is of little 
practical importance to the administrator. The time element 
becomes important only if the typewriter being used cannot be 
spared for the required time, or if the testee objects to the 
time required. It has been the experience of the author that 
the practical appearance of the test tends to eliminate objec- 
tions of testees to the time required. 

In some cases it may be desirable to administer the test 
with a time limit, particularly when all that is desired is a yes 
or no decision as to whether or not an applicant should be 
hired. The time limit should be determined in such cases by 
deciding on the lowest percentile in terms of combined score 
which will be acceptable. Assuming no errors, the converted 
error score for no errors (26) should be deducted from the 
combined score representing the previously selected percentile. 
The resultant figure will give the time limit to be allowed. Or, 
if the administrator prefers, the time limit can be called when 
a desired percentage of the applicants have completed the test, 
all applicants who have failed to complete the test being elimi- 
nated from consideration. Use of the test with a time limit 
makes it impossible to secure a speed, accuracy, or combined 
score. Results therefore can not be used with maximum effec- 
tiveness for guidance or training. 


Directions for Administering the Test 


The Typing Ability Analysis is practically self-administer- 
ing, and may be given either individually or in groups. In 
addition to a typewriter, each person taking the test should 
have the following materials: eraser, erasing shield, pencil, a 
sheet of carbon paper, 4 sheets of yellow paper for carbon 
copies, and a test booklet. 

Before starting the test, each testee is permitted sufficient 
time to become familiar with the typewriter being used. Test 
booklets are issued with the instructions, ‘‘Read the instruc- 


417 














EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


tions on the first page of the test. Do not turn the page or 
start the test until told to do so.” Instructions appearing on 
the first page of the test booklet are as follows: 

This test measures ability to do the kind of typing 
that is required in business and industry. 

Test results will be judged by usual office standards 
and will be rated on the basis of accuracy, speed, and 
form. Errors will be penalized in proportion to their 
seriousness, least for errors which have been corrected, 
more for errors which could have been corrected, and 
most for errors which would require retyping of the 
part in which they occur. 

Errors may be corrected by erasure. Do not re- 
type any part unless absolutely necessary, and in such 
case use the back side of the same sheet of paper. 


Work as rapidly as possible, but do the kind of 
work desired by an employer. 

After the instructions have been read, the examiner gives 
the signal to start the test. The exact time that each person 
starts and finishes the test is recorded. When the tests are 
completed, the examiner makes sure that the items are in 
correct order, and then staples all parts together. The time 
of starting and finishing is recorded on the first page of the 
test. 

Scoring 

The time (speed) score is the number of minutes required 
to complete the test. 

The error score is based on three types of errors: (1) 
corrected errors, (2) minor errors, and (3) gross errors, an 
explanation of which was given earlier. 

Each test part is proofread carefully and compared with 
the instructions. Errors are marked on the test by encircling 
them with a red pencil, and recorded on the rating sheet. The 
rating sheet contains a list of all errors commonly made, but 
is not a complete list of all possible types of errors. When 
unlisted errors are made, the definitions for errors as given 
previously should determine whether they are gross or minor. 


418 








out 
is I 
dra 
eitl 


qué 
era 
are 





or 
on 


ds 
id 
ir 


d, 


1e 











A TEST FOR SELECTING AND TRAINING INDUSIR.AL 7TYPISTS 


A single error is not penalized twice. For example, figures 
out of alignment are penalized as such. An additional penalty 
is not given for poor appearance. If in copying the rough 
draft, the typist spells screen as ‘“‘screne’’ a penalty is given for 
either misspelled word or failure to make indicated change. 
The error is not penalized in both ways. 

When penalizing results for “poor appearance,” the 
quality of paper is taken into consideration, inasmuch as neat 
erasures are almost impossible on some types of paper, but 
are easily made on other types. 


Norms 


As has previously been mentioned, a combined score will 
generally be more valuable than separate speed and accuracy 
scores. Correlational and regression equation techniques 
showed that for the typing jobs considered in this study, speed 
and accuracy were approximately equal in importance. Com- 
bined scores of maximum efficiency should be obtained by mul- 
tiple correlations and regression equations based on test results 
of typists hired for each company using the test. Norms given 
here for three different weightings of speed and accuracy, 
however, will be adequate approximations for most jobs. The 
most suitable of the three will generally be the SA score which 
gives equal weight to speed and accuracy. It is obtained by 
adding the time score to the converted error score. The 2SA 
score (two times the speed score plus the converted error 
score) weights speed and accuracy in the ratio 2:1 and is suit- 
able for jobs that require fast speed and where accuracy is 
comparatively unimportant (as might be-the case for a rough 
draft copy typist). The S2A score (speed score plus two 
times the converted error score) weights speed and accuracy 
in the ratio 1:2 and is suitable for jobs placing a premium on 
high accuracy and where speed is comparatively unimportant 
(as in the case of some typists of legal documents). 

Industrial and educational norms are given in Table 4. 
Industrial norms are based on 381 applicants for typing posi- 


419 









EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 





tions. Educational norms are based on 255 high school seniors 
who were given the test from one to two months previous to 
being graduated and who were in the advanced (second year) 
typing class. 

















TABLE 4 
NORMS 
Industrial Educational 
N 381 Applicants N 255 H.S. Seniors 
Stan. Combined Combined 

%ile Score Time Errors* SA 2SA S2A Time Errors* SA 2SA S2A 
100 3.00 16 54 78 76 32 0 76 116 104 
99 2.33 26 0 71 103 103 41 1 91 139 128 
98 2.05 30 1 77 113 114 46 2 97 148 137 
95 1.64 37 3 88 128 130 51 4 106 162 152 
90 1.28 42 5 96 142 144 57 5 114 174 164 
80 .84 49 7 107 158 161 63 8 123 188 179 
70 -52 54 9 115 170 174 67 10 130 199 191 
60 3$ 58 12 122 180 185 71 12 136 208 200 
50 -00 61 14 128 189 194 75 14 142 217 209 
40 -.25 65 16 134 199 204 78 16 147 225 217 
30 -.52 69 20 141 209 215 82 20 153 234 227 
20 —.84 74 25 149 221 228 87 24 160 245 238 
10 -1.28 81 32 159 237 245 93 31 170 260 253 
5 -1.64 86 41 168 250 259 98 38 177 272 266 

2 -2.05 92 58 178 266 275 104 49 186 285 280 

1 -2.33 97 $1 185 276 286 108 66 192 295 290 
M 61.41 17.20 127.96 189.37 194.50 78.78 17.14 141.76 216.64 208.63 
S.D. 15.14 13.60 24.63 37.26 39.37 14.33 14.08 21.77 33.53 34.81 





*Percentiles and standard scores computed on basis of converted error 
scores. For convenience, errors reported in this table are total error scores. 

Norms are based on standard scores for selected percentile 
points. Analysis of data by means of the Otis Normal Per- 
centile Chart® showed marked linearity (normality) of all 
distributions. Standard scores can consequently be used to 
determine percentile points. The converted error score was 
used for computing error norms, though for convenience the 
errors reported in Table 4 are expressed in terms of total 
errors rather than converted error scores. 


Correlations Between Speed and Accuracy 


Correlations between speed (minutes) and accuracy (con- 
verted error scores) are all low. Summarized data are given 


in Table 5. 


3Published by World Book Company. 
420 























\| 


» 


IrnoO= 


eo Css = ot OU 


















A TEST FOR SELECTING AND TRAINING INDUSTRIAL TYPISTS 


TABLE 5 


CORRELATIONS BETWEEN SPEED AND ACCURACY 














Group N r Standard error 
All cases 636 +.137 +.04 
Industrial Applicants 193 +.213 +.07 
Civil Service Applicants 188 +.200 +.07 
High School Seniors 255 +.077 +.06 
V alidity 


Validity is based on 67 employed typists in an industrial 
population. Typists were dichotomized on the basis of grade 
of job successfully filled, and validity determined by means of 
biserial coefficients. The p group consisted of 28 girls em- 
ployed in Kimberly-Clark Corporation’s home office or in 
positions of similar caliber in various mills of the corporation. 
The g group was composed of 39 girls employed in mill offices 
and mailing departments of the same corporation. All girls 
were engaged in work which was primarily typing. 

Guilford (2) has pointed out that: “‘a biserial r should 
not be computed unless the graduated series of measurements 
is reasonably well normally distributed and unless N is rela- 
tively large—preferably when N is greater than 50. Another 
important condition is that the cases be not too unevenly 
divided between the two distributions.”” These data fulfilled 
the above requirements reasonably well. The N of 67 was 
divided into p and q groups containing 42% and 58% of the 
cases. Pearson’s chi-squared test of goodness of fit gave a P 
of .748 for a combined score based on equal weighting of 
speed and accuracy. Culler (1) classes this as an “excellent” 
fit. Validity coefficients are given in Table 6.4 


4It may be pointed out that an assumption underlying the derivation of 
biserial r is that the dichotomized trait is in reality continuous and normally 
distributed. If this condition does not hold, the size of biserial r may be 
appreciably affected; the value of r indicating perfect relationship may be 
considerably greater than unity and obtained r’s greater than would other- 
wise be obtained. It is entirely possible that this assumption was not fulfilled 
with these data, although the magnitude of the effect cannot be measured due 
to lack of methods which can be used to demonstrate normality of criteria in 
cases such as this. An attempt was made to approach normality so far as pos- 
sible by including all grades of typists ranging from those in beginning typing 
jobs to those in the highest grade typing jobs of Kimberly-Clark Corporation. 


421 



































EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


TABLE 6 
VALIDITY COEFFICIENTS 
(N = 67 Employed Industrial Typists ) 











Score Validity r Standard Error 
Combined (SA) 957 +.04 
Time .796 + .08 
Converted errors 711 +.09 
Raw errors .659 +.10 
Gross errors 456 +.13 
Minor errors 555 +.12 
Corrected errors 334 +.14 





Additional evidence of validity was secured by comparing 
the differences in combined (SA) scores between various 
groups. Differences are summarized in Table 7. Critical 

TABLE 7 


COMPARISON OF GROUPS TO DETERMINE VALIDITY 











Group N M S.D. 
A. Ind. typists—high job classification 28 91.89 14.04 
B. Ind. typists—low job classification 39 135.05 20.85 
C. Ind. typists—released, inadequate 10 178.50 21.71 
D. Civil Service applicants—Jr. typists 63 122.00 20.71 
E. Civil Service applicants—Asst. typists 125 130.22 19.57 
Groups Compared Critical Ratio Significance 
A and B 9.97 .999 
B and C 5.18 .999 
D and E 2.60 .995 





ratios were computed by dividing the difference between the 
means by the standard error of the difference. The standard 
error of the mean was computed by formulae suitable for small 
samples (2), as follows: 


) Ty 
.~ VN—1 when N = 20; 


on = FEF when 10<N< 20. 
All critical ratios are significant at the 1% level, thereby 
giving additional evidence of test validity by differentiating 
between groups which logically should show differences in 
ability. 
422 











NS eee 





















A TEST FOR SELECTING AND TRAINING INDUSTRIAL TYPISTS 


Reliability 


No adequate situation has yet been found in which to 
secure accurate reliability coefficients. Split-half methods are 
inapplicable inasmuch as test parts were selected on the basis 
of low intercorrelations as well as high validity coefficients. 
Two equivalent forms have been constructed; however, their 
use will usually be inadequate for reliability coefficients be- 
cause of the effect of practice or disuse between testing periods. 
The same is true of repeated administration of one test form. 


Table 8 presents reliability data obtained by administering 
two test forms to 63 high school seniors in their second-year 
typing class. The first test administration was in April and 
the second in May. Reliability is probably higher than indi- 
cated, inasmuch as practice between the two test administra- 
tions varied considerably; e.g., some girls missed many typing 
classes because of commencement activities whereas others 
received considerable extra practice because of typing material 
for the school annual. Reliability coefficients quoted here are 
thus lowered by irrelevant influences of speed of learning and 
opportunity to learn. In spite of the unfavorable conditions 
under which reliability was secured, results nevertheless indi- 
cate reasonable reliability. 

TABLE 8 
RELIABILITY AND PROBABLE ERRORS 


(N = 63 High School Seniors) 











Reliability Probable Error of 

Coefficient Index Obtained Score True Score 
Time .768 .876 4.66 4.08 
Converted Errors .720 848 5.47 4.64 
SA Score 832 912 6.02 5.49 
2SA Score 846 .920 9.23 8.49 
S2A Score .800 894 10.50 9.39 





Interpretation and Use of Scores 


The Typing Ability Analysis is now being used for several 
purposes and in several types of situations. The purpose for 


423 














EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


its use in any particular situation determines the way in which 
results should be interpreted and utilized. 


Various companies are using the analysis for selecting 
typists for specific openings in stenographic and secretarial 
work. The test obviously should not be used as the sole means 
of selecting typists. Many factors must be taken into con- 
sideration, and this analysis measures only one group of such 
factors. The test should be used as a supplement to other 
selection methods and procedures, and test results should be 
interpreted in light of all other pertinent factors. Most com- 
panies using the test for employee selection do not discuss test 
results with applicants, although some believe that the time 
required for such discussion is warranted on the basis of 
improved public relations. 


When the analysis is used for vocational selection, the 
industrial norms will usually apply because the employment 
manager will be interested most in knowing how a given 
applicant compares with other applicants. In the case of a 
recent high school graduate without any job experience, the 
employment manager may also wish to know how the applicant 
compares with high school seniors. Thus he is able to estimate 
not only how qualified she is at the present time, but also how 
satisfactory she is apt to be after securing typing experience. 


A second major use of the analysis is in training currently 
employed typists. Results are discussed with the girl concerned 
and she is told whether her speed and accuracy are acceptable, 
the type (or types) of error she is prone to make, and other 
shortcomings which should be corrected in order that her work 
may be improved. 

A third use of the analysis has been in the upgrading of 
industrial employees. Results have been used in the same way 
as in training in order to help typists prepare themselves for 
higher-caliber typing jobs, or to enable girls working in the 


plant on production jobs to fit themselves for typing jobs in 
the office. 


424 


















th 
te 
ak 


SU 




















A TEST FOR SELECTING AND TRAINING INDUSTRIAL TYPISTS 


High schools are using the analysis for vocational guid- 
ance. Such usage generally includes suggestions for improving 
typing ability as well as recommendations regarding types of 
jobs which can be filled successfully. High school teachers will 
usually be interested in using the educational norms, although 
it may also be of value to determine how a particular student 
who will soon be a job applicant compares with other job 
applicants. Although high school teachers sometimes believe 
that such comparison is unfair due to lack of experience on 
the part of high school students, it must be remembered that 
most industrial personnel men are more interested in hiring 
an applicant with good typing ability than in hiring a promis- 
ing high school student who compares favorably with other 
students but who cannot compete successfully with other job 
applicants. This is particularly true during depression periods 
when jobs are scarce and applicants are numerous. 

The Typing Ability Analysis can also be used to compare 
the ability of various typing classes, efficiency of different 
teachers, rate of progress, etc., in all situations where typing 
ability is defined as those factors which differentiate between 
successful and unsuccessful typists in the industrial situation. 


REFERENCES 


1. Culler, E. “Studies in Psychometric Theory,” Journal of Experi- 
mental Psychology, 1X (1926), 169-194. 


2. Guilford, J. P. Psychometric Methods. New York: McGraw-Hill, 
1936, 51-52, 351. 

3. Horst, Paul. “Obtaining Comparable Scores from Distributions of 
Dissimilar Shape,” Journal of the American Statistical Association, 
XXVI (1931), 455-460. : 

4. Jurgensen, C. E. “Extension of the Minnesota Rate of Manipula- 
tion Test,” Journal of Applied Psychology, (1942) In press. 


5. Stead, W. H., Shartle, C. L., et al. Occupational Counseling 
Techniques. New York: American Book Co., 1940, 245-252. 
























an 
tri 
va 
ur 
fet 
fic 
yle 
the 
Yn 


Di 


thr 























MEASUREMENT ABSTRACTS* 


Bryan, Alice I. and Wilke, Walter H. “Audience Tendencies 
in Rating Public Speakers.” Journal of Applied Psychol- 
ogy, XXVI (1942), 371-381. 

Using the Bryan-Wilke Scale for rating public speeches, 
the authors studied a variety of audiences with a number of 
factors that are associated with audience ratings, such as fac- 
tors related to analytical ability of audience, time of rating, 
effect of age of raters, influence of sex of raters, and intel- 
ligence and personality of speaker. Louise T. Grossnickle. 





Carter, Harold D. ‘‘How Reliable are the Common Measures 
of Difficulty and Validity of Objective Test Items?” 
Journal of Psychology, XIII (1942), 31-38. 

Two hundred college students, mostly juniors, were given 
an objective test, consisting of 80 items, of which 30 were 
true-false, 30 multiple choice, and 20 of the completion 
variety. The purpose was to ascertain the reliability of meas- 
ures of item difficulty and of item validity by means of dif- 
ferent subgroups. The author found that a measure of dif- 
ficulty of test items, based upon a representative sampling, 
yielded a higher reliability coefficient than that obtained from 
the ordinary method of using good and poor students. K. S. 
Yum. 





DuBois, Philip H. ‘A Note on the Computation of Biserial r 
in Item Validation.”” Psychometrika, VII (1942), 143- 
146. 

A method of computing biserial coefficients of correlation 
through the use of punch card tabulating equipment is pre- 
sented. Each item is assigned a separate column and successes 





*Edited by Forrest A. Kingsbury. 


427 











EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


are punched 1. By arranging the cards on the criterion vari- 
able and obtaining progressive sums on several columns 
simultaneously, it is possible to obtain data for several corre- 
lations in one run of the cards through the machine. (Courtesy 
Psychometrika.) 





Engelhart, Max D. “Unique Types of Achievement Test 

Exercises.” Psychometrika, VII (1942), 103-115. 

In this article are presented a number of unusual achieve- 
ment test exercises of both the essay and the objective types. 
These exercises may suggest to others engaged in the construc- 
tion of achivement tests certain forms which they may find 
useful either as models or as points of departure in the inven- 
tion of new forms. The article also calls attention to certain 
problems which must be solved if achievement testing is to 
have a sound, scientific basis. (Courtesy Psychometrika. ) 





Estes, Stanley G. “A Study of Five Tests of ‘Spatial’ Ability.” 

Journal of Psychology, XIII (1942), 265-271. 

The object of the study was to determine the extent to 
which each of five tests, all of them requiring response to 
spatial relationships, were related to each other and to achieve- 
ment in descriptive geometry, a subject where the ability under 
consideration was of basic importance. The correlations of 
four of these tests with descriptive geometry were all reliably 
greater than zero and did not differ significantly from each 
other. Therefore, the author concluded that the tests, with 
the exclusion of the Crawford Structural Visualization Test, 
were equally valid with the criterion he used. K. S. Yum. 





Ferguson, Leonard W. and Lawrence, Warren R. “An Ap- 
praisal of the Validity of the Factor Loadings Employed 
in the Construction of the Primary Social Attitude Scales.” 
Psychometrika, VII (1942), 135-138. 


428 















alte 
the 
inst 
loa 
nat 
incl 
eitl 
cor 


Gk 


av 


th 
ca 
co 
es 
cc 


— 


_» = & fae OO 

















MEASUREMENT ABSTRACTS 


In this article the authors examine the effect of including 
alternate test forms in a factor matrix upon the validity of 
the resultant factor loadings, finding that in this particular 
instance the effect is negligible. Comparisons of the factor 
loadings derived from matrices in which only one of the alter- 
nate test forms is included with those in which both forms are 
included reveal practically no difference in the magnitude of 
either the original or rotated factor loadings, or in that of the 
computed communalities. (Courtesy Psychometrika.) 





Ghiselli, Edwin E. “Estimating the Minimal Reliability of a 
Total Test from the Intercorrelations Among, and the 
Standard Deviations of, the Component Parts.” Journal 
of Applied Psychology, XXVI (1942), 332-337. 

Due to the nature of a test, two equivalent parts are not 
available for estimating its reliability. However, if, in all of 
the parts, sigmas are equal and the intercorrelations are equal, 
then the Spearman-Brown correction formula for any length 
can easily be derived from a general formula for the reliability 
coefficient of the total test. Since r,, = r?,. will be a minimum 
estimate of r,,, it is possible to obtain a minimal reliability 
coefficient of the total test. K. S. Yum. 





Kelley, Truman L. “The Reliability Coefficient.” Psycho- 
metrika, VIL (1942), 75-83. 


The reliability coefficient is unlike other measures of corre- 
lation in that it is a quantitative statement of an act of judg- 
ment—usually the test-maker’s—that the things correlated are 
similar measures. Attempts to divorce it from this act of 
judgment are misdirected, just as would be an attempt to 
eliminate judgment of sameness of function of items when a 
test is originally drawn up. A “coefficient of cohesion,” en- 
tirely devoid of judgment, measuring the singleness of test 


429 















EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


function is proposed as an essential datum with reference to a 
test, but not as a substitute for the similar-form reliability co- 
efficient. (Courtesy Psychometrika.) 





Kuhlmann, F. and Odoroff, M. E. “Verification of the Heinis 
Mental Growth Curve on Results with the Stanford-Binet 
Tests.” Journal of Psychology, XIII (1942), 355-364. 


The usefulness of the I.Q. depends upon its constancy for 
the bright child and the dull child as well as for the typical 
child. However, this assumption is not warranted for all levels 
of intelligence. The late Dr. Kuhlmann preferred the index 
based on the Heinis mental growth curve because he believed 
in its superiority over the I.Q. for predictive purposes. This 
particular study, based on a large number of cases, shows that 
the average Stanford-Binet 1.Q. of a group of special class 
pupils drops approximately 1.5 points per year or a total of 
15 points between the ages of 6 and 16, while the average 
Heinis “personal constant,” which Kuhlmann has renamed the 
“per cent of average’ score, for the same cases shows no 
tendency to increase or decrease. Louise T. Grossnickle. 





Moffie, Dannie J. ““A Non-verbal Approach to the Thurstone 
Primary Mental Abilities.” Journal of General Psychol- 
ogy, XXVII (1942), 35-61. 


An attempt was made to measure five of the Primary 
Mental Abilities—perceptual speed, space, inductive reasoning, 
deductive reasoning, and memory—by means of performance 
tests. The author was successful in finding non-verbal meas- 
ures of the space, reasoning, and perceptual speed factors. 
Tests for inductive and deductive reasoning were found to be 
measures of one reasoning factor. Robert L. Cramer. 


430 















)- 




















MEASUREMENT ABSTRACTS 


Munroe, Ruth L. ‘An Experiment in Large Scale Testing by 
a Modification of the Rorschach Method.” Journal of 
Psychology, XII1 (1942), 229-263. 


This technique of the Inspection Diagnosis consists essen- 
tially of a systematic review of each protocol with special 
attention to twenty-four items known to be of significance in 
Rorschach diagnosis. The results show some very striking 
correspondence between the Rorschach ratings and three sep- 
arate lines of validation material, and suggest a strong prob- 
ability that the Rorschach Inspection Diagnosis is a valid and 
useful technique for large scale testing. Louise T. Grossnickle. 





Powell, Norman J. and Levine, Harold. “Reliability of the 
Civil Service Oral Examination.” American Journal of 


Psychology, LV (1942), 385-393. 


Ninety-nine applicants who had passed a written examina- 
tion for the position of Junior Psychologist were interviewed 
and rated in the conventional manner by two panels acting with 
varying degrees of independence. Considerable differences in 
ratings were found in all cases. Robert L. Cramer. 





Stagner, Ross and Katzoff, E. T. “Fascist Attitudes: Factor 
Analysis of Item Correlations.’” Journal of Social Psy- 


chology, XVI (1942), 3-9. 


Eighteen statements reflecting Fascist thought were pre- 
sented to one hundred college students to be checked according 
to agreement or disagreement. A centroid factor analysis of 
the correlations between items showed three factors to be 
present: concern over protection of property rights, lack of 
sympathy for the unfortunate, and an aggressive nationalism. 
Robert L. Cramer. 


431 











EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Taylor, William S. “Partialling out Sums of Squares and 
Products in Calculating Correlations with Non-homo- 
geneous Data.” British Journal of Psychology, XXXII 
(1942), 318-323. 


If the population tested is homogeneous, a coefficient of 
correlation calculated directly from the deviations of indi- 
vidual scores about the grand mean will give a reliable indica- 
tion of the correlation between the scores of the individuals 
tested. Where the population is not homogeneous, group dif- 
ferences may be significant. The correlation desired is that 
free from the influence of group differences. In this latter 
case, it is necessary to partial out the sums of squares and sums 
of products of deviations from the mean, using only those 
attributable to the deviations “within groups.” K. S. Yum. 





Thomson, Godfrey H. “Following up Individual Items in a 
Group Intelligence Test.” British Journal of Psychology, 
XXXII (1942), 310-317. 

The article describes the technique used for item selection 
in the construction of Moray House Test 24, a group intel- 
ligence test, and presents a later follow-up study of the test 
items in their discriminating function. The research reveals 
that the predictive power of the various test items differs with 
different levels of educational achievement. The items that 
predict well in the secondary school are not necessarily the 
best indication of their power to discriminate the potential 
secondary school pupils from those not suitable. K. §. Yum. 





Thorndike, Robert L. ‘Regression Fallacies in the Matched 
Groups Experiment.” Psychometrika, VII (1942), 85- 
102. 

This paper is concerned particularly with certain regression 
effects which appear whenever matched groups are drawn from 
populations which differ with regard to the characteristics 


432 











bei 


ter 
th 
$iZ 
fe 
an 
be 
ar 
Se 
er 
su 


a lc LZ OGD 









MEASUREMENT ABSTRACTS 





being studied. It is shown that regression will produce sys- 
tematic differences between these groups on measures other 
than those upon which they were specifically matched. The 
size and direction of these differences depend upon the dif- 
ferences between the parent populations both in the matching 
and in the experimental variables and upon the correlation 
between the matching and experimental variables. Formulas 
are presented for estimating the expected regression effect. 
Several alternative procedures are suggested for avoiding the 
erroneous conclusions which the regression effect is likely to 
suggest. (Courtesy Psychometrika.) 
































Tinkelman, Sherman. “Civil Service Test Item Preparation: 
A Case Study.” Public Personnel Quarterly, Ill (1942), 
3-74. Ag 
The author traces the evaluation of test items to be used 

in a civil service examination, discussing source material, 

validity, public relations impact, and revision of the items. 

Robert L. Cramer. 





Wolfle, Dael L. ‘Factor Analysis in the Study of Person- 


ality.” Journal of Abnormal and Social Psychology, 
XXXVII (1942), 393-397. 


A review of the previous studies in this field singled out 
seven factors, each of which had appeared in three or more 
studies. These were will, cleverness, shyness, self-confidence, 
fluency, depression, and hypersensitivity; The author noted 
two important characteristics of these personality factors. 
They will sometimes duplicate each other or sometimes cut 
across. Chief emphasis was placed upon the statement that | 
factor analysis provides a powerful analytic tool for isolating 
the important variables of human personality and that the re- 
sults thus obtained depend on the evaluation by clinicians and 
experimentalists. K. §. Yum. 





433 








EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Yum, K. S. “Student Preferences in Divisional Studies and 
Their Preferential Activities.’’ Journal of Psychology, 
XIII (1942), 193-200. 


The Kuder Preference Record was given to 193 college 
students for a study of their preferential interests in the seven 
major types, namely, scientific, computational, musical, artistic, 
literary, social service, and persuasive activities. The author 
found that the comparison of the mean profiles of the students 
in the physical, biological, and social sciences as well as the 
comparison of the mean profiles of men and women were 
significantly and consistently different on some of the major 
types of preferences. The correlation coefficients between the 
preference scores and academic achievement were negligible 
except in the case of the literary activities for the entire group 
and also for the group of men, and the computational activi- 
ties for the group of women. Louise T. Grossnickle. 








